Skip to content

Instantly share code, notes, and snippets.

@ReubenBond
Created July 20, 2024 02:33
Show Gist options
  • Save ReubenBond/5ff7f699407b910e673b396bc4bc15e1 to your computer and use it in GitHub Desktop.
Save ReubenBond/5ff7f699407b910e673b396bc4bc15e1 to your computer and use it in GitHub Desktop.
Vsync v2.2.2095 by Ken Birman. Extracted from https://codeplexarchive.org/codeplex/project/vsync
This file has been truncated, but you can view the full file.
// <copyright file="Vsync.cs" company="Kenneth P. Birman">
// Vsync System, V2.2.$Rev: 2095 $, Developed by Kenneth P. Birman, (c) 2010 - 2014. All rights reserved.
// This code is subject to copyright and other intellectual property restrictions and
// may be used only under license from Dr. Birman or his designated agents.
//
// It is a violation of international intellectual property protection laws to remove this notice.
//
// Dr. Birman has elected to grant Cornell University and other researchers the right to experiment with Vsync and
// to create a derived version Vsync V2.2.xxxx (and other future versions, using the same version numbering scheme)
//
// Dr. Birman and Cornell University intend to place Vsync V1.x.xxxx and V2.x.xxxx into the public domain, in the
// manner described below, and subject to licensing provisions intended to protect Dr. Birman and Cornell from
// any responsibility for consequences of the use of this technology, even when used precisely as intended.
//
// Vsync is a free research product, created at Cornell University by Dr. Birman with some help from his
// students. It has not been subjected to a professional quality assurance or professional testing process
// of the kind normally used in life or safety-critical systems, although we have used it fairly extensively
// for several years now. But this implies that if you intend to use Vsync in such a setting, you take on
// an obligation to do a stringent level of testing, to gain the maximum assurance feasible that the system,
// as it will be used in your target setting, achieves the properties required for safe and correct operations.
// Cornell can't warrant the appropriateness or correctness of Vsync for these kinds of very demanding uses.
// Only a high quality of analysis of the target setting, the proposed use, and then a very thorough quality
// assurance process can justify that sort of application. Further, we recommend very strongly that you design
// for "fail safe" operations: if the application that uses Vsync shuts down (for example due to network
// partitioning failures of a kind Vsync can't handle automatically), your end-user should still be promised
// safety.
//
// Vsync is a renamed version of Isis2, with the name changed intended to distance our work from any association
// with the terrorist group that uses the Isis name. No other change was made: this code is just the Isis2 code
// with a global edit to rename everything called ISIS as VSYNC, Isis as Vsync, and Isis2 as Vsync.
//
// Licensing terms for Vsync v1.x.xxxx and v2.x.xxxx use freeBSD license language and include indemnification
// whereby the end user holds Dr. Birman and Cornell harmless for any and all uses of this technology. Dr.
// Birman and Cornell University are not aware of any external patents, copyrights or trademarks
// that might be required by users of the Vsync system, but have not conducted a
// thorough patent search and are not able to guarantee that no such licenses exist. The end user accepts
// full responsibility to obtain any needed licenses required for the conduct of their work with this
// system, and holds Cornell harmless even in the event that Vsync itself is found to infringe some
// patent, copyright or trademark. Cornell holds no patents that read on the Vsync technology and is not
// planning to seek patents on this work, nor is Dr. Birman personally. Indeed, because of the extensive
// track record of publications in this area over a period of some 30 years, it seems unlikely that patents
// could be obtained on the technology by any party. Nonetheless, Cornell (and Dr. Birman) cannot guarantee
// that this technology doesn't infringe any existing or future patent. Should licenses be required on any
// existing or future technology, the user of Vsync, and not Dr. Birman or Cornell, has sole responsibility
// to negotiate such licenses and to pay any associated fees.
//
// Source code access may, in some cases, be granted to facilitate code maintanence. Contact Dr. Birman
// for details.
//
// Dr. Birman and Cornell assert no ownership interest of any kind in applications developed using Vsync,
// as distinct from the Vsync system per-se. Example applications included in the Vsync documentation may be
// copied, modified and incorporated into end-user applications without limitation or restriction: they are
// treated as public domain material.
//
// At the present time, Dr. Birman and his group at Cornell are providing support for this technology. No
// committment, implied or explicit, is made to resolve any particular issue or to fix any particular bug.
// Like any complex technology, the user should expect Vsync to have some bugs, perhaps serious, and should
// be aware that precisely because this is a free technology, support may be slow, frustrating, and may
// not lead to a successful resolution of the issue. Vsync is not a product, and there are no current plans
// to create a commercial product in this space.
//
// Dr. Birman reserves version numbers in the v1.x.xxxx range for versions of the system created by him
// privately, outside of his Cornell employment. Vsync versions in the v2.x.xxxx range are reserved
// for binary and code releases by the Cornell group headed by Dr. Birman and Dr. Robbert van Renesse.
//
// Please contact Dr. Birman at 607-255-9199 if you have any questions about the use of this technology.
// His email is ken@cs.cornell.edu and his web site is http://www.cs.cornell.edu/ken
//
// *******************************************************************************************
// A few comments on coding conventions used here
// First, this code is extremely multithreaded and the threading and protection logic is key to correctness
// Don't understand that logic? Don't touch this code!
// The C# lock statement didn't mix well with my use of thread priorities; use
// using(LockAndElevate(lock-object)) { ... }
// Vsync has its own locking infrastructure for coarse-grained long-lived locks; use these for that sort of thing
// ********************************************************************************************
// </copyright>
////#define PROTOCOL_BUFFERS // Enable Google's Protocol Buffers support via protobuf-net.
////#define __MonoCS__ // Similarly, should be automatically set, needed when compiling for C# on Linux via Mono
// The DEFINEs that follow are useful in debugging but not for production runs. Some can be VERY slow
//#define TRACKLOCKINFO // If defined, tracks lock information, warns about apparent deadlocks or priority inversions
//#define TRACKLOCKINGORDER // If both are defined, watches for potential lock ordering issues (e.g. usually any single thread locks A before B, but now B was locked, then A). Broken in .NET4
//#define EXTRACTCALLSTACKS // A risky and expensive .NET mechanism that can extract the call stack but sometimes seems to trigger deadlocks or other exceptions
//#define WARN_ON_LONG_DELAYS // Useful if locking problems are suspected in message handlers
// Because Debug is a constant, we disable what would otherwise be 1000s of warnings
#pragma warning disable 0162
#pragma warning disable 0429
namespace Vsync
{
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Globalization;
using System.IO;
using System.IO.MemoryMappedFiles;
using System.Linq;
using System.Linq.Expressions;
using System.Net;
using System.Net.NetworkInformation;
using System.Net.Sockets;
using System.Reflection;
using System.Runtime.InteropServices;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.Security;
using System.Security.Cryptography;
using System.Security.Permissions;
using System.Text;
using System.Threading;
#if PROTOCOL_BUFFERS
using ProtoBuf;
using ProtoBuf.Meta;
#endif
/// <summary>
/// General purpose exception
/// </summary>
/// <remarks>The fatal exceptions also cause Vsync to shut down</remarks>
[Serializable]
public class VsyncException : Exception
{
/// <summary>
/// General purpose exception constructor
/// </summary>
/// <param name="s">Reason for the exception</param>
public VsyncException(string s)
: base(s)
{
if (!VsyncSystem.VsyncActive && VsyncSystem.VsyncWasActive)
{
return;
}
VsyncSystem.WriteAckInfo();
VsyncSystem.Shutdown(s);
}
/// <summary>
/// General purpose exception constructor
/// </summary>
/// <param name="s">String describing the problem</param>
/// <param name="e">Ret exception</param>
public VsyncException(string s, Exception e)
: base(s, e)
{
if (!VsyncSystem.VsyncActive && VsyncSystem.VsyncWasActive)
{
return;
}
VsyncSystem.WriteAckInfo();
VsyncSystem.Shutdown(s);
}
}
/// <summary>
/// General purpose Client exception
/// </summary>
[Serializable]
public class VsyncClientException : Exception
{
/// <summary>
/// General purpose exception constructor
/// </summary>
/// <param name="s">Reason for the exception</param>
public VsyncClientException(string s)
: base(s)
{
}
}
/// <summary>
/// General purpose DHT exception
/// </summary>
[Serializable]
public class VsyncDHTException : Exception
{
/// <summary>
/// General purpose exception constructor
/// </summary>
/// <param name="s">Reason for the exception</param>
public VsyncDHTException(string s)
: base(s)
{
}
}
/// <summary>
/// Thrown by disklogger when SafeSend is employed without first setting the SafeSendThreshold, or
/// if the number of members of the group isn't at least SafeSendThreshold when multicasts are sent
/// </summary>
[Serializable]
public class SafeSendException : Exception
{
/// <summary>
/// General purpose exception constructor
/// </summary>
/// <param name="s">Reason for the exception</param>
public SafeSendException(string s)
: base(s)
{
VsyncSystem.WriteAckInfo();
VsyncSystem.Shutdown(s);
}
}
/// <summary>
/// General purpose exception
/// </summary>
/// <remarks>The fatal exceptions also cause Vsync to shut down</remarks>
[Serializable]
public class RejectedMessageException : Exception
{
/// <summary>
/// Message rejected by Vsync because of a signature problem
/// </summary>
/// <param name="s">Reason for the exception</param>
public RejectedMessageException(string s)
: base(s)
{
}
}
/// <summary>
/// Aggregation failure exception
/// </summary>
[Serializable]
public class AggregationFailedException : Exception
{
private readonly int reasonCode;
/// <summary>
/// Aggregaton failure exception constructor
/// </summary>
public AggregationFailedException(int reasonCode)
{
this.reasonCode = reasonCode;
}
/// <exclude></exclude>
protected AggregationFailedException(SerializationInfo info, StreamingContext context)
: base(info, context)
{
this.reasonCode = (int)info.GetValue("reasonCode", typeof(int));
}
/// <exclude></exclude>
public int ReasonCode
{
get
{
return this.reasonCode;
}
}
/// <exclude></exclude>
[SecurityPermission(SecurityAction.LinkDemand, Flags = SecurityPermissionFlag.SerializationFormatter)]
public override void GetObjectData(SerializationInfo info, StreamingContext context)
{
base.GetObjectData(info, context);
info.AddValue("reasonCode", this.reasonCode);
}
}
/// <summary>
/// Thrown if an AbortReply is received
/// </summary>
[Serializable]
public class VsyncAbortReplyException : Exception
{
/// <summary>
/// Constructor for AbortReply exceptions
/// </summary>
/// <param name="s"></param>
public VsyncAbortReplyException(string s)
: base(s)
{
}
}
/// <summary>
/// Thrown if a SafeSend can't complete because the group has fewer than SafeSendThreshold members
/// </summary>
[Serializable]
public class VsyncSafeSendException : Exception
{
/// <summary>
/// Constructor for AbortReply exceptions
/// </summary>
/// <param name="s"></param>
public VsyncSafeSendException(string s)
: base(s)
{
}
}
[Serializable]
internal class MCMDException : Exception
{
public MCMDException(string s)
: base(s)
{
VsyncSystem.Shutdown(s);
}
}
/// <summary>
/// Thrown if a Join-only operation can't find a preexisting group with the right name or address
/// </summary>
[Serializable]
public class GroupNotFoundException : Exception
{
/// <summary>
/// Constructor for GroupNotFoundException exceptions
/// </summary>
/// <param name="s">Reason for the problem</param>
public GroupNotFoundException(string s)
: base(s)
{
VsyncSystem.Shutdown(s);
}
}
/// <summary>
/// Thrown when Vsync is shutting down
/// </summary>
[Serializable]
public class VsyncShutdownException : Exception
{
/// <summary>
/// Constructor for VsyncShutdownException exceptions
/// </summary>
/// <param name="s">Reason Vsync shut itself down</param>
public VsyncShutdownException(string s)
: base(s)
{
if (VsyncSystem.shuttingDown)
{
VsyncSystem.AbandonShip();
}
VsyncSystem.Shutdown();
}
}
/// <summary>
/// Type signature for an Vsync ViewHandler callback method
/// </summary>
/// <param name="v"></param>
public delegate void ViewHandler(View v);
/// <summary>
/// Type signature for an Vsync universal callback method
/// </summary>
/// <param name="objs"></param>
public delegate void UCallback(object[] objs);
/// <summary>
/// Type signature for an Vsync CheckPtMaker callback method. Creates new checkpoints
/// </summary>
/// <param name="v"></param>
public delegate void ChkptMaker(View v);
/// <summary>
/// If defined, returns true if this member will send the checkpoint to the joiners, and false if not.
/// </summary>
/// <param name="v">The new view of the group</param>
/// <param name="who">A specific joiner for whom the question is being posed</param>
/// <returns>True if the member in which the call was done is responsible for sending the checkpoint for the specified joiner.</returns>
/// <remarks>By default, when a new view is defined all members evaluate v.IAmLeader() and the (single) leader creates the checkpoint used to initialize (all) the joiners.
/// That is, a single checkpoint is made, by a single member, and a copy is delivered to each joiner. However, you can override this behavior if desired.
/// When you do so, by defining the ChkptChoser(), Vsync does a parallel evaluation of this method in every current group member. For each joiner, a single
/// member will be picked to send a checkpoint to it. All others must return false. If nobody returns true, the joiner will hang waiting for
/// a checkpoint that will never arrive. If a failure disrupts the checkpoint transfer, the joiner will throw a "Join failed" exception.
/// </remarks>
public delegate bool ChkptChoser(View v, Address who);
/// <summary>
/// Type signature for an Vsync group initializer callback method
/// </summary>
public delegate void Initializer();
/// <summary>
/// Type signature for an Vsync Timer callback handler method
/// </summary>
public delegate void TimerCallback();
/// <summary>
/// Type signature for an Vsync Watch callback handler method
/// </summary>
/// <param name="ev">The event (W_JOIN or W_LEAVE)</param>
public delegate void Watcher(int ev);
/// <summary>
/// Signature for the callback in a master when a new worker registers via RunAsWorker()
/// </summary>
/// <param name="who"></param>
public delegate void NewWorker(Address who);
/// <summary>
/// Type signature for a SafeSend logging method, used to ensure the durability of SafeSend multicasts
/// </summary>
/// <param name="m">A pending SafeSend message, not yet committed for delivery</param>
public delegate void durabilityMethod(Msg m);
/// <summary>
/// Callback on broken lock
/// </summary>
/// <param name="why">LOCK_TRANSFER (to rank 0 member) or LOCK_BROKEN</param>
/// <param name="name">Lock name</param>
/// <param name="holder">Previous lock holder</param>
public delegate void LockBroken(int why, string name, Address holder);
/// <summary>
/// Signature for the Vsync logging callback
/// </summary>
/// <param name="LEvent">Event types, as defined in Group</param>
/// <param name="StartorDone">IL_START or IL_DONE</param>
/// <param name="sender">The process that initiated this operation</param>
/// <param name="LId">An identifier to match START and DONE events and track queries</param>
/// <param name="args">Parameters for this type of event</param>
public delegate void ILFunc(int LEvent, int StartorDone, Address sender, long LId, params object[] args);
internal class Callable
{
internal int nParams;
internal Type[] ptypes;
internal Delegate hisCb;
internal Delegate cb;
internal Callable(Delegate hisCb)
{
this.hisCb = hisCb;
MethodInfo mi = hisCb.GetType().GetMethod("Invoke");
ParameterInfo[] pi = mi.GetParameters();
this.ptypes = pi.Select(p => p.ParameterType).ToArray();
this.nParams = this.ptypes.Length;
#if !(__MonoCS__ || ___ANDROID__) // VN
try
{
if (this.nParams == 0)
{
this.cb = Delegate.CreateDelegate(typeof(Action), hisCb.Target, hisCb.Method, false);
}
else if (this.nParams <= 16)
{
this.cb = Delegate.CreateDelegate(Expression.GetActionType(this.ptypes), hisCb.Target, hisCb.Method, false);
if (this.cb == null)
{
Type rtype = hisCb.GetType().GetMethod("Invoke").ReturnType;
// Callback to a function method; used in the DHTPutCollisionResolver logic
Vsync.ArrayResize(ref this.ptypes, this.ptypes.Length + 1);
this.ptypes[this.ptypes.Length - 1] = rtype;
this.cb = Delegate.CreateDelegate(Expression.GetFuncType(this.ptypes), hisCb.Target, hisCb.Method, false);
}
}
}
catch (ArgumentException e)
{
// If we get a "MethodInfo must be a runtime MethodInfo object." error, use old way, otherwise propegate exception
if (e.ParamName == null || !e.ParamName.Equals("method", StringComparison.Ordinal))
{
throw;
}
}
#else
this.cb = null;
#endif
}
internal void doUpcall(object[] args)
{
if (args.Length != this.nParams)
{
throw new ArgumentException("Argument count must match number of parameters");
}
#if !(__MonoCS__ || ___ANDROID__) // VN
if (this.cb != null)
{
switch (this.nParams)
{
case 0:
((dynamic)this.cb).Invoke();
break;
case 1:
((dynamic)this.cb).Invoke((dynamic)args[0]);
break;
case 2:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1]);
break;
case 3:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2]);
break;
case 4:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3]);
break;
case 5:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4]);
break;
case 6:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5]);
break;
case 7:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6]);
break;
case 8:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7]);
break;
case 9:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8]);
break;
case 10:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9]);
break;
case 11:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9], (dynamic)args[10]);
break;
case 12:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9], (dynamic)args[10], (dynamic)args[11]);
break;
case 13:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9], (dynamic)args[10], (dynamic)args[11], (dynamic)args[12]);
break;
case 14:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9], (dynamic)args[10], (dynamic)args[11], (dynamic)args[12], (dynamic)args[13]);
break;
case 15:
((dynamic)this.cb).Invoke((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9], (dynamic)args[10], (dynamic)args[11], (dynamic)args[12], (dynamic)args[13], (dynamic)args[14]);
break;
}
}
else
{
MethodInfo mi = this.hisCb.GetType().GetMethod("Invoke");
mi.Invoke(this.hisCb, args);
}
#else
this.hisCb.DynamicInvoke(args);
#endif
}
}
/// <summary>
/// Specifies the marshalling mechanism.
/// </summary>
internal enum MarshallingMechanism
{
/// <summary>
/// None specified.
/// </summary>
None,
/// <summary>
/// Use auto marshalling.
/// </summary>
AutoMarshalled,
/// <summary>
/// Use self marshalling via the ISelfMarshalled interface.
/// </summary>
SelfMarshalled,
/// <summary>
/// Use protocol buffers for marshalling.
/// </summary>
ProtocolBuffers
}
/// <summary>
/// Designates a class as suitable for automatic marshalling via Vsync.
/// </summary>
/// <remarks>
/// When the <bf>AutoMarshalled</bf> attribute is specified, Vsync will automatically marshall the fields of
/// a class into a byte[] array, and create and initialize object instances from received byte[] arrays.
/// </remarks>
[AttributeUsage(AttributeTargets.Class)]
public sealed class AutoMarshalledAttribute : Attribute
{
}
/// <summary>
/// Instructs Vsync that a class implements its own marshalling.
/// </summary>
/// <remarks>
/// The class must implement a public constuctor accepting a single byte array to deserialize and initialize new instances.
/// </remarks>
public interface ISelfMarshalled
{
/// <summary>
/// Serializes the instance to a byte array.
/// </summary>
/// <returns>The instance serialized to a byte array.</returns>
byte[] toBArray();
}
#if !PROTOCOL_BUFFERS
/// <summary>
/// Dummy attribute.
/// </summary>
[AttributeUsage(AttributeTargets.Field)]
internal sealed class ProtoMember : Attribute
{
internal ProtoMember(int index)
{
}
}
#endif
/// <summary>
/// Interface defintion for Vsync aggregators
/// </summary>
/// <remarks>
/// To define a custom Aggregator, implement the IAggregator interface, create an instance for each group in which you will use it,
/// and register that instance via Group.RegisterAggregator. If you need to have multiple instances of one aggregator
/// for a single group, encode an instance identifier of some kind into the key; Vsync won't allow you to register more than one
/// aggregator of a single type in a single group.
/// </remarks>
/// <typeparam name="KeyType">The KeyType must be a primitive C# base type, or must implement the IEqualityComparer interface</typeparam>
/// <typeparam name="ValueType"></typeparam>
public interface IAggregator<in KeyType, ValueType>
{
/// <summary>
/// Combines two objects of type ValueType, which are associated with the given key. Canned from an Vsync thread.
/// </summary>
/// <param name="key">The key associated with this aggregation operation</param>
/// <param name="fromLeft">Value received from the node to my left on this token ring</param>
/// <param name="fromBelow">Value coming up from below</param>
/// <returns></returns>
ValueType Aggregate(KeyType key, ValueType fromLeft, ValueType fromBelow);
}
/// <summary>
/// AggregatorDel is a delegate type that can be used to do in-line anonymous declaration of Vsync aggregators in calls to Group.RegisterAggregator
/// </summary>
/// <typeparam name="KType">The associated KeyType</typeparam>
/// <typeparam name="Vtype">The associated ValueType</typeparam>
/// <param name="key">The key</param>
/// <param name="fromLeft">A value received from the node to the left</param>
/// <param name="fromBelow">A value received from the node below</param>
/// <returns>An aggregator function should return a ValueType object computed as the aggregate of fromLeft and fromBelow</returns>
public delegate Vtype Aggregator<in KType, Vtype>(KType key, Vtype fromLeft, Vtype fromBelow);
// This is used to "trick" C# into doing subtyping on my derived IAggregator objects
internal interface IAggregateEventHandler
{
// In fact, three of the events return null, but the fourth fetches the marshalled aggregate state and returns byte[]
byte[] AggEvent(int eventType, int vid, object key, object value, int offset);
byte[] AggEvent(int eventType);
string AggName();
string AggState();
object GetDValues();
void LoadDValues(object fromBelow);
void GotSGAggInfo(bool fromBelow, int level, int vid, object key, object value);
Type GetKeyType();
Type GetValueType();
}
/// <summary>
/// Specifies how long a Query should wait for results and the action to take on timeout
/// </summary>
/// <remarks>
/// Vsync employs <bf>Timeout</bf> objects to specify desired actions when a Query has waited excessively long.
/// These specify the delay until timeout (in milliseconds) and an action to take.
/// </remarks>
public class Timeout
{
internal int when;
internal int action;
internal string origin;
/// <summary>
/// <bf>TO_ABORTREPLY:</bf> Abort this Query by simulating an <bf>AbortReply()</bf>
/// </summary>
public const int TO_ABORTREPLY = 0;
/// <summary>
/// <bf>TO_NULLREPLY:</bf> Stop waiting for the slow group member by simulating a <bf>NullReply()</bf> from that member
/// </summary>
public const int TO_NULLREPLY = 1;
/// <summary>
/// <bf>TO_FAILURE:</bf> Stop waiting for the slow group member by informing Vsync that the member has failed. This is an extreme action
/// and will cause Vsync to disconnect from that member and send it a poison pill, just in case it is still alive. Use with care!
/// </summary>
public const int TO_FAILURE = 2;
/// <summary>
/// <bf>TO_AGGFAILURE:</bf> For aggregations only, triggers an aggregation exception in the leader and, if the process responsible for the
/// timeout can be identified, causes that process to be poisoned.
/// </summary>
public const int TO_AGGFAILURE = 4;
/// <summary>
/// Constructs a new member of the <bf>Timeout</bf> class
/// </summary>
/// <param name="to">Delay until timeout occurs, in ms</param>
/// <param name="act">Action to take (for queries, one of TO_ABORTREPLY, TO_NULLREPLY, TO_FAILURE; for aggregation, TO_AGGFAILURE)</param>
public Timeout(int to, int act)
{
this.when = to;
this.action = act;
}
internal Timeout(int to, int act, string why)
{
this.when = to;
this.action = act;
this.origin = why;
}
}
/// <summary>
/// Various system constants. Some accessible from outside users
/// </summary>
internal static class Vsync
{
/// <summary>
/// Prints a very easily understood message if overloaded
/// </summary>
internal static bool VSYNC_SHUTDOWNIFOVERLOADED = true;
/// <summary>
/// Target size for ORACLE
/// </summary>
internal static int VSYNC_ORACLESIZE = 3;
/// <summary>
/// If a group view has 50 or more members, use OOB transfer to initialize joining members. Value should be larger than ORACLESIZE
/// </summary>
internal static int VSYNC_INITVIAOOB = 50;
/// <summary>
/// Can be overridden by environment variables. Becomes false if this process can't get the proper UDP port numbers
/// </summary>
internal static bool VSYNC_CANJOINORACLE = true;
/// <summary>
/// Maintain index if View members list is 16 or longer
/// </summary>
internal const int VSYNC_INDEXMEMBERS = 16;
/// <summary>
/// Largest configurations that have been tested reasonably carefully. We'll make this bigger and bigger over time.
/// </summary>
internal static int VSYNC_MAXSYSTEMSIZE = 2048;
/// <summary>
/// Inhibit new sends if more than this many messages are known to Vsync. Value adjusted each time view changes in VSYNCMEMBERS
/// </summary>
internal static int VSYNC_ASYNCMTOTALLIMIT = 100;
/// <summary>
/// MAXASYNCMTOTAL won't be allowed to become larger than this value
/// </summary>
internal static int VSYNC_MINASYNCMTOTAL = 50;
internal static int VSYNC_MAXRBACKLOG = VSYNC_ASYNCMTOTALLIMIT;
/// <summary>
/// If a group has more than this many members, and a UDP-only multicast is attempted, Vsync switches to an overlay multicast
/// </summary>
internal static int VSYNC_MAXDIRECTSENDS = 16;
/// <summary>
/// We recommend keeping this fairly small to avoid excessive "memory pressure" on the kernel
/// </summary>
internal static long VSYNC_MAXMSGLEN = 32 * 1024;
/// <summary>
/// Used in the out of band transfer logic, initialized there
/// </summary>
internal static long VSYNC_OOBCHUNKSIZE;
/// <summary>
/// Vsync will round packets up to a multiple of this
/// </summary>
internal static int VSYNC_LEN_ROUNDUP = 1;
/// <summary>
/// Maximum packets to send per second, if non-zero
/// </summary>
internal static int VSYNC_RATELIM;
/// <summary>
/// Maximum size for data sent in an Send, OrderedSend or SafeSend, needed because Vsync flow control can malfunction with extremely large objects
/// </summary>
internal static long VSYNC_MAXMSGLENTOTAL = 256 * VSYNC_MAXMSGLEN;
/// <summary>
/// If true, Vsync uses SHA2/256 signatures to sign every marshalled object, and won't demarshall (hence won't accept) unsigned messages
/// </summary>
internal static bool VSYNC_SIGS = true;
/// <summary>
/// If provided, Vsync encrypts the SHA2/256 signatures with this key
/// </summary>
internal static byte[] VSYNC_AESKEY;
/// <summary>
/// How many acks to wait for when doing COMMIT in the ORACLE
/// </summary>
internal static int VSYNC_ACKTHRESHOLD = 2;
internal static Aes VSYNC_AES;
internal static LockObject VSYNC_AES_LOCK = new LockObject("VSYNC_AES_LOCK");
/// <summary>
/// Used to seed the initialization vector employed by AES
/// </summary>
internal static RNGCryptoServiceProvider VSYNC_AESSEED;
/// <summary>
/// If non-empty, the names of nodes where ORACLE instances can be found (if any are running)
/// </summary>
internal static string VSYNC_HOSTS = string.Empty;
/// <summary>
/// If true, Vsync uses no IPMC at all, even for startup. (Value of MAXIPMCADDRS ignored in this case)
/// </summary>
internal static bool VSYNC_UNICAST_ONLY = false;
/// <summary>
/// Tells Vsync to use TCP rather than Vsync native communication (over UDP) for OOB UNICAST transfers
/// </summary>
internal static bool VSYNC_OOBVIATCP = true;
/// <summary>
/// Provides a form of user-level control over use of the new RDMA (verbs) mechanisms. For internal/testing use only
/// </summary>
internal static bool VSYNC_USERDMA = false;
internal static int VSYNC_GROUPPORT = 11002;
/// <summary>
/// Vsync allocates virtual IPMC addresses in this range; must be large enough to permit groups to have unique gaddrs.
/// </summary>
internal static int VSYNC_MCRANGE_LOW = 5000;
/// <summary>
/// Physical ones use same range but are managed by Dr. Multicast
/// </summary>
internal static int VSYNC_MCRANGE_HIGH = VSYNC_MCRANGE_LOW + 500000;
/// <summary>
/// Listener port for incoming P2PSocket connection requests' AckSocket uses VSYNC_DEFAULT_PORTNOp+1
/// </summary>
internal static int VSYNC_DEFAULT_PORTNOp = 9753;
/// <summary>
/// Listener port for incoming AckSocket connection requests
/// </summary>
internal static int VSYNC_DEFAULT_PORTNOa = VSYNC_DEFAULT_PORTNOp + 1;
/// <summary>
/// Listener port for incoming OOB via TCP connection requests
/// </summary>
internal static int VSYNC_DEFAULT_PORTNOt = VSYNC_DEFAULT_PORTNOp + 2;
/// <summary>
/// Limit on how many IPMC addresses can be in use other than for OOB code. WARNING: Can be temporarily exceeded while remapping.
/// </summary>
internal static int VSYNC_MAXIPMCADDRS = 25;
/// <summary>
/// Limit on how many IPMC addresses can be used for OOB code
/// </summary>
internal static int OOBMAXIPMCADDRS = 25;
internal static byte[] VSYNC_HDR = Msg.StringToBytes("->VSYNC<-");
internal static bool VSYNC_LOG_CREATED = false;
internal static Address foundOracle;
internal static Address my_address;
internal static byte[] my_address_bytes;
internal static List<Address> recent_inquiries = new List<Address>();
internal static LockObject recent_inquiries_lock = new LockObject("recent_inquiries_lock");
/// <summary>
/// Result of applying IPExtractAddrs to VSYNC_HOSTS
/// </summary>
internal static IPAddress[] VSYNC_HOSTS_IPADDRS;
internal static Thread receiveThread;
internal static int CLASSD = 224 << 24;
/// <summary>
/// Padding added to the end of a serialized message to hold the signature, if any
/// </summary>
internal static int VSYNC_MSGPADDING = 0;
/// <summary>
/// How many are being used right now
/// </summary>
internal static int nPhysAddrsInUse = 0;
/// <summary>
/// Overhead for an Vsync ACK packet, which is just a byte array inside an enclosing byte array
/// </summary>
internal static int VSYNC_BAOVERHEAD = 100;
/// <summary>
/// How many bytes Vsync adds as overhead (quite a few, in the worst case)
/// </summary>
internal static int VSYNC_OVERHEAD = 900;
/// <summary>
/// Vsync will fragment objects this large or larger...
/// </summary>
internal static long VSYNC_MUSTFRAGMENT;
/// <summary>
/// ... into objects of this size.
/// </summary>
internal static long VSYNC_FRAGLEN;
internal static int VSYNC_MCMDBBSIZE = 512;
/// <summary>
/// Forces garbage collection every 2 seconds, if more than a very few messages are "active" in Vsync (currently, 10)
/// </summary>
internal static int VSYNC_GCFREQ = 2000;
/// <summary>
/// Report MCMD stats every 5 minutes of runtime
/// </summary>
internal static int VSYNC_MCMDREPORTRATE = 5 * 60;
/// <summary>
/// 20 second grace period during startup to deal with C# class loader locking I/O here and there
/// </summary>
internal static long GRACEPERIOD = 20000;
internal static byte[] VSYNC_OK = { (byte)'O', (byte)'K' };
/// <summary>
/// True if Vsync shouldn't multicast on the first interface
/// </summary>
internal static bool VSYNC_SKIP_FIRSTINTERFACE = false;
internal static bool VSYNC_DONT_COMPRESS = false;
internal static string VSYNC_NETWORK_INTERFACES;
internal static List<int> InterfaceIds;
internal static List<IPAddress> VSYNC_MY_IPADDRS;
/// <summary>
/// After retransmitting, minimum delay before doing it a second time (hence the 3rd send)
/// </summary>
internal static int VSYNC_MIN2NDRTSEND = 500;
/// <summary>
/// Much fudging needed to get these right
/// </summary>
internal static int VSYNC_MAXRETRIES = 3;
/// <summary>
/// Much fudging needed to get these right. My original theory was that this should be O(log(N)) but for now I've pegged it at 3
/// </summary>
internal static int VSYNC_MAXLGRETRIES = 3;
/// <summary>
/// 15 seconds
/// </summary>
internal static int VSYNC_DEFAULTTIMEOUT = 15000;
/// <summary>
/// Set to true if VSYNC_DEFAULTTIMEOUT has been doubled because the application is sending large objects
/// </summary>
internal static bool BigTimeouts = false;
/// <summary>
/// 5 seconds
/// </summary>
internal static int VSYNC_WARNAFTER = 5000;
/// <summary>
/// 15 minutes
/// </summary>
internal static int VSYNC_REMAPDELAY = 15 * 60 * 1000;
/// <summary>
/// Ticks per millisecond (one tick is 100 nanoseconds)
/// </summary>
internal static int MS = 10000;
internal static int VSYNC_ISBIG = 5;
/// <summary>
/// 0: don't route; 1:LAN only. Increase with care or you might DDoS the whole data center!
/// </summary>
internal static int VSYNC_TTL = 1;
/// <summary>
/// Pause between passing tokens in ms, 20x faster after recent activity. Must be multiple of 20
/// </summary>
internal static int VSYNC_TOKEN_DELAY = 40;
/// <summary>
/// Set to true in RunAsWorker()
/// </summary>
internal static bool WORKER_MODE = false;
/// <summary>
/// Set to true when VsyncSystem.Start() is ready to wait
/// </summary>
internal static volatile bool OK_TO_SEND_WORKER_REQ = false;
internal static volatile Address MY_MASTER = null;
internal static volatile Address MY_OLD_MASTER = null;
internal static volatile bool heardFromMaster;
/// <summary>
/// For message id's on one-to-one ping messages, which show up with gaddr=NULLADDRESS
/// </summary>
internal static int OneToOneCntr = 0;
internal static bool VSYNC_UDPCHKSUM = false;
internal static int my_pid = Process.GetCurrentProcess().Id;
/// <summary>
/// Address of this process
/// </summary>
internal static IPAddress my_IPaddress;
internal static string my_host;
internal static bool IAmOracle = false;
internal static Group ORACLE;
internal static Group VSYNCMEMBERS;
/// <summary>
/// Only active in the Oracle group
/// </summary>
internal static Thread OracleViewThread;
internal static ViewDelta[] Proposed;
internal static int LeaderId;
/// <summary>
/// True once this instance is running the "oracle leader" code
/// </summary>
internal static volatile bool RunningLeaderLogic;
internal static Address NULLADDRESS = new Address();
internal static Address ClientOf;
internal static long OracleFailedAt;
internal static int newClientOfCnt;
/// <summary>
/// False disables all logging by Vsync
/// </summary>
internal static bool VSYNC_LOGGED = true;
/// <summary>
/// True if you want Vsync to configure VSYNCMEMBERS as a "large" group
/// </summary>
internal static bool VSYNC_LARGE = false;
internal static bool VSYNC_MUTE = false;
internal static bool VSYNC_TRACKTHREADWAITS = true;
internal static bool VSYNC_IGNORESMALLPARTITIONS = true;
internal static bool VSYNC_IGNOREPARTITIONS = false;
internal static bool VSYNC_INFINIBAND;
internal static int VSYNC_LID;
internal static string VSYNC_LOGDIR = "logs";
internal static Stream my_logstream;
internal static Semaphore lSema = new Semaphore(1, int.MaxValue);
internal static int MapperEpochId;
internal static string VSYNC_NETMASK = string.Empty;
internal static IPAddress VSYNC_NETMASK_ADDR;
internal static string VSYNC_SUBNET = string.Empty;
internal static IPAddress VSYNC_SUBNET_ADDR;
internal static bool VSYNC_GRACEFULSHUTDOWN = false;
internal static Semaphore VSYNC_SLEEP = new Semaphore(0, int.MaxValue);
internal static void Sleep(int howLong)
{
ILock.NoteThreadState("Sleep(" + howLong + ")");
VSYNC_SLEEP.WaitOne(howLong);
ILock.NoteThreadState(null);
}
internal static List<Address> RIPList = new List<Address>();
internal static LockObject RIPLock = new LockObject("RIPLock");
internal static string RIPListState()
{
string s;
using (var tmpLockObj = new LockAndElevate(RIPLock))
{
s = "RIPList = " + Address.VectorToString(RIPList.ToArray());
}
using (var tmpLockObj = new LockAndElevate(Group.GroupRIPLock))
{
s += "; Group RIPList = " + Address.VectorToString(Group.GroupRIPList.ToArray()) + Environment.NewLine;
}
return s;
}
internal const int JOIN = -1;
internal const int LEAVE = -2;
internal const int FDETECTION = -3;
internal const int PROPOSE = -4;
internal const int INITIALVIEW = -5;
internal const int INQUIRE = -6;
internal const int COMMIT = -7;
internal const int FANNOUNCE = -8;
internal const int ORDEREDSEND = -9;
internal const int SETORDER = -10;
internal const int RELAYJOIN = -11;
internal const int RELAYLEAVE = -12;
internal const int STATEXFER = -13;
internal const int TERMINATE = -14;
internal const int RELAYTERM = -15;
internal const int REMAP = -16;
internal const int SAFESEND = -17;
internal const int SAFEDELIVER = -18;
internal const int JOINFAILED = -19;
internal const int RELAYSEND = -20;
internal const int DALDONE = -21;
internal const int ISSTABLE = -22;
internal const int BECLIENT = -23;
internal const int ORACLERUNNING = -24;
internal const int PARTITIONED = -25;
internal const int RELAYREGISTERVG = -26;
internal const int REGISTERVG = -27;
internal const int IM_DHT_PUT = -28;
internal const int IM_DHT_GET = -29;
internal const int IM_UDP_TUNNEL = -30;
internal const int IM_IPMC_TUNNEL = -31;
internal const int IM_IPMC_VIEWS = -32;
internal const int CRYPTOWRAPPED = -33;
internal const int CLIENTWRAPPED = -34;
internal const int FRAGMENT = -35;
internal const int CAUSALSEND = -36;
internal const int DISKLOGGER = -37;
internal const int PING = -38;
internal const int CANBEORACLE = -39;
internal const int LOCKREQ = -40;
internal const int SGAGGREGATE = -41;
internal const int OUTOFBAND = -42;
internal const int IBADDRS = -43;
internal const int SYSTEMREQS = 43;
internal static string rToString(object o)
{
string[] names = { "JOIN", "LEAVE", "FDETECTION", "PROPOSE", "INITIALVIEW", "INQUIRE", "COMMIT", "FANNOUNCE", "ORDEREDSEND", "SETORDER", "RELAYJOIN", "RELAYLEAVE", "STATEXFER", "TERMINATE", "RELAYTERM", "REMAP", "SAFESEND", "SAFEDELIVER", "JOINFAILED", "RELAYSEND", "DALDONE", "ISSTABLE", "BECLIENT", "ORACLERUNNING", "PARTITIONED", "RelayRegisterVG", "RegisterVG", "DHTPut", "DHTGet", "UDP Tunnel", "IPMC Tunnel", "IPMC Views", "CRYPTOWRAP", "CLIENT-TO-GROUP", "FRAGMENT", "CAUSAL SEND", "DISK LOGGER", "PING", "CAN BE ORACLE", "LOCKREQ", "Small Group AGGREGATE", "OutOfBand", "IBADDRS" };
if (o is int)
{
int r = (-(int)o) - 1;
if (r >= 0 && r < names.Length)
{
return names[r];
}
return "Hander[" + (int)o + "]";
}
return o.GetType().ToString();
}
public static void WriteLine()
{
WriteLine(string.Empty);
}
public static void WriteLine(string what)
{
if (!VsyncSystem.VsyncActive && VsyncSystem.VsyncWasActive)
{
return;
}
if (VSYNC_LOGGED)
{
Write(what + Environment.NewLine);
}
}
// This method is unsafe and should only be used internally in Vsync.cs for debugging purposes.
internal static string ExtractStackTrace()
{
return ExtractStackTrace(Thread.CurrentThread, int.MaxValue);
}
// This method is unsafe and should only be used internally in Vsync.cs for debugging purposes.
internal static string ExtractStackTrace(Thread targetThread, int depth)
{
#if EXTRACTCALLSTACKS
try
{
if (targetThread != Thread.CurrentThread && (targetThread.ThreadState & System.Threading.ThreadState.Running) != 0)
{
return "(can't extract stack trace on a running thread)";
}
string st = "{ ";
if (Thread.CurrentThread.Name != null)
{
st = "Thread[" + Thread.CurrentThread.Name + "]{ ";
}
StackTrace stackTrace = new StackTrace(targetThread, false);
StackFrame[] stackFrames = stackTrace.GetFrames();
// write call stack method names
int ignore = 2;
foreach (StackFrame stackFrame in stackFrames)
{
if (ignore-- <= 0 && depth-- >= 0)
{
st += stackFrame.GetMethod().Name + "... ";
}
}
return st + "}";
}
catch (ThreadStateException)
{
return "(attempt to trace thread triggered thread state exception)";
}
#else // EXTRACTCALLSTACKS
return "??";
#endif // EXTRACTCALLSTACKS
}
public static void Write(string what)
{
if (!VSYNC_MUTE)
{
if (VsyncSystem.noConsole)
{
Debug.Write(what);
}
else
{
Console.Write(what);
}
}
ILock.NoteThreadState("lsema.WaitOne()");
lSema.WaitOne();
ILock.NoteThreadState(null);
if (VSYNC_LOGGED && my_logstream != null)
{
byte[] buf = Msg.StringToBytes(what);
my_logstream.Write(buf, 0, buf.Length);
my_logstream.Flush();
}
lSema.Release();
}
internal static void WriteLineLog(string what)
{
WriteLog(what + Environment.NewLine);
}
internal static void WriteLog(string what)
{
if (!VsyncSystem.VsyncActive && VsyncSystem.VsyncWasActive)
{
return;
}
if (VSYNC_LOGGED)
{
byte[] buf = Msg.StringToBytes(what);
my_logstream.Write(buf, 0, buf.Length);
my_logstream.Flush();
}
}
public static void CloseLog()
{
ILock.NoteThreadState("lsema.WaitOne()");
lSema.WaitOne();
ILock.NoteThreadState(null);
if (VSYNC_LOGGED)
{
VSYNC_LOGGED = false;
my_logstream.Flush();
Vsync.my_logstream.Close();
Vsync.my_logstream = null;
}
lSema.Release();
}
internal static void ArrayResize<T>(ref T[] oldArray, int len)
{
Array.Resize(ref oldArray, len);
}
internal class TCB
{
internal int id;
internal long when;
internal TimerCallback cb;
internal TCB(int tid, long t, TimerCallback tcb)
{
this.id = tid;
this.when = t;
this.cb = tcb;
}
}
private static readonly LinkedList<TCB> timer_list = new LinkedList<TCB>();
private static readonly LockObject timer_lock = new LockObject("timer_lock");
private static readonly int ENDOFTIME = timer_list.AddLast(new TCB(-1, long.MaxValue, () => { throw new VsyncException("end of time"); })).Value.id;
private static readonly double DURATIONMS = 1000.0 / Stopwatch.Frequency;
private static readonly double DURATIONTICK = 10000000.0 / Stopwatch.Frequency;
private static readonly Semaphore timer_wait = new Semaphore(0, int.MaxValue);
internal static Thread timer_thread;
/// <summary>
/// Gets the time since the system was started in milliseconds.
/// </summary>
/// <returns>The time since the system was started in milliseconds.</returns>
internal static long NOW
{
get
{
return (long)(Stopwatch.GetTimestamp() * DURATIONMS - VsyncSystem.StartedAt);
}
}
/// <summary>
/// Gets the precise time since the system was started in ticks (100 nanoseconds).
/// </summary>
/// <returns>The precise time since the system was started in ticks (100 nanoseconds).</returns>
/// <remarks>This is used when printing the times at which packets were sent/received.</remarks>
internal static long TICKS
{
get
{
return (long)(Stopwatch.GetTimestamp() * DURATIONTICK);
}
}
internal static int TID;
/// <summary>
/// Register for a callback from Vsync after a designated number of ms, using a new thread.
/// </summary>
/// <param name="ms">Delay in milliseconds</param>
/// <param name="cb">Method to call, on a new thread</param>
/// <returns></returns>
public static int OnTimerThread(int ms, TimerCallback cb)
{
return OnTimer(ms, () => new Thread(() =>
{
try
{
cb();
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "OnTimer callback: " + cb.Method, IsBackground = true }.Start());
}
/// <summary>
/// Register for a callback from Vsync after a designated number of ms
/// </summary>
/// <param name="ms">Delay in milliseconds</param>
/// <param name="cb">Method to call</param>
/// <returns></returns>
public static int OnTimer(long ms, TimerCallback cb)
{
ms += NOW;
TCB newtcb = new TCB(++TID, ms, cb);
return InsertOnTimerQueue(newtcb);
}
private static int InsertOnTimerQueue(TCB newtcb)
{
using (var tmpLockObj = new LockAndElevate(timer_lock))
{
bool new_first = true;
for (LinkedListNode<TCB> tcbnode = timer_list.First; tcbnode != null; tcbnode = tcbnode.Next)
{
if (tcbnode.Value.when > newtcb.when)
{
timer_list.AddBefore(tcbnode, newtcb);
if ((VsyncSystem.Debug & VsyncSystem.TIMERS) != 0)
{
VsyncSystem.WriteLine("InsertOnTimerQueue add callback with id " + newtcb.id + " registered before id " + tcbnode.Value.id + " for time " + newtcb.when + " at time " + NOW);
}
timer_wait.Release();
return TID;
}
new_first = false;
}
timer_list.AddLast(newtcb);
if ((VsyncSystem.Debug & VsyncSystem.TIMERS) != 0)
{
VsyncSystem.WriteLine("InsertOnTimerQueue add callback with id " + newtcb.id + " registered at tail for time " + newtcb.when + " at time " + NOW);
}
if (new_first)
{
timer_wait.Release();
}
return TID;
}
}
/// <summary>
/// Adjust a callback to occur at a new deadline time
/// </summary>
/// <param name="tid">Timer ID</param>
/// <param name="newDeadline">new deadline, in ms relative to NOW</param>
/// <returns></returns>
public static void TimerReset(int tid, long newDeadline)
{
if (tid == ENDOFTIME)
{
return;
}
LinkedListNode<TCB> tcbnode;
newDeadline += NOW;
using (var tmpLockObj = new LockAndElevate(timer_lock))
{
for (tcbnode = timer_list.First; tcbnode != null; tcbnode = tcbnode.Next)
{
if (tcbnode.Value.id == tid)
{
timer_list.Remove(tcbnode);
break;
}
}
}
if (tcbnode != null)
{
tcbnode.Value.when = newDeadline;
InsertOnTimerQueue(tcbnode.Value);
}
}
public static void TimerCancel(int tid)
{
if (tid == ENDOFTIME)
{
return;
}
using (var tmpLockObj = new LockAndElevate(timer_lock))
{
for (LinkedListNode<TCB> tcb = timer_list.First; tcb != null; tcb = tcb.Next)
{
if (tcb.Value.id == tid)
{
timer_list.Remove(tcb);
break;
}
}
}
}
internal static void TimerThread()
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
try
{
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[2]++;
List<TimerCallback> cbs = new List<TimerCallback>();
VsyncSystem.CheckParentThread();
using (var tmpLockObj = new LockAndElevate(timer_lock))
{
LinkedListNode<TCB> tcb;
while ((tcb = timer_list.First).Value.when <= NOW)
{
if ((VsyncSystem.Debug & VsyncSystem.TIMERS) != 0)
{
Vsync.WriteLine("TimerThread to call Timer Callback with id " + tcb.Value.id + " registered for time " + tcb.Value.when + " at time " + NOW);
}
long delta = NOW - tcb.Value.when;
if (delta > 10000)
{
Vsync.WriteLine("WARNING: Vsync timer thread woke up " + delta + "ms late!");
}
timer_list.Remove(tcb);
cbs.Add(tcb.Value.cb);
}
}
foreach (TimerCallback cb in cbs)
{
cb();
}
int delay;
using (var tmpLockObj = new LockAndElevate(timer_lock))
{
delay = (int)Math.Min(2500, timer_list.First.Value.when - NOW);
}
if (delay > 0)
{
timer_wait.WaitOne(delay);
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
}
internal static string GetTimerState()
{
string s = "Timer State: NOW = " + MsToSecs(NOW) + "... ";
// Omit unless explictly requested: too verbose...
if ((VsyncSystem.Debug & VsyncSystem.TIMERS) != 0)
{
using (var tmpLockObj = new LockAndElevate(timer_lock))
{
for (LinkedListNode<TCB> tcb = timer_list.First; tcb != null; tcb = tcb.Next)
{
if (tcb.Value.when != long.MaxValue)
{
s += Environment.NewLine + " [" + tcb.Value.id + "] At " + MsToSecs(tcb.Value.when) + " do callback to " + tcb.Value.cb.Method;
}
}
}
}
s += Environment.NewLine + ReliableSender.HeardFromState();
return s;
}
/// Prints in s.xxx format
internal static string MsToSecs(long when)
{
return (when / 1000L) + "." + (when % 1000L).ToString("D3");
}
// Prints in hr:min:sec.xxxxxx format
internal static string TimeToString(long when)
{
return new TimeSpan(when * 10000L).ToString();
}
public static void NodeHasFailed(Address which, string howDiscovered, bool inhibitReport)
{
if (Vsync.VSYNCMEMBERS == null || !Vsync.VSYNCMEMBERS.HasFirstView)
{
throw new VsyncException("Lost connection to Vsync system during startup.");
}
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
if (!Vsync.RIPList.Contains(which))
{
Vsync.RIPList.Add(which);
}
else
{
return;
}
}
if ((VsyncSystem.Debug & VsyncSystem.FAILURES) != 0)
{
Vsync.WriteLine("NodeHasFailed: " + which + " (" + howDiscovered + "), inhibitReport=" + inhibitReport);
}
if (Vsync.ClientOf != null && Vsync.ClientOf == which)
{
if (Vsync.VSYNC_ORACLESIZE == 1)
{
throw new VsyncShutdownException("System shutdown: Vsync was configured with VSYNC_ORACLESIZE=1 and the Oracle has failed or terminated.");
}
if (Vsync.OracleFailedAt != 0 && (Vsync.NOW - Vsync.OracleFailedAt) > Vsync.VSYNC_DEFAULTTIMEOUT)
{
throw new VsyncShutdownException("System shutdown: After Vsync ORACLE member " + Vsync.ClientOf + " failed, this client was unable to contact any other ORACLE member.");
}
Vsync.WriteLine("WARNING: Lost connection to the ORACLE; attempting to reconnect...");
if (Vsync.OracleFailedAt == 0)
{
Vsync.OracleFailedAt = Vsync.NOW;
}
Vsync.OnTimer(2500, () =>
{
if (Vsync.ClientOf != null && Vsync.ClientOf == which)
{
throw new VsyncException("System shutdown: After Vsync ORACLE member " + Vsync.ClientOf + " failed, this client was unable to contact any other ORACLE member.");
}
});
}
if (!ORACLE.HasFirstView || !VsyncSystem.VsyncActive || VsyncSystem.VsyncRestarting)
{
return;
}
// Clean up the CanBeOracleList to make sure we don't try to add a deceased process to the ORACLE
using (var tmpLockObj = new LockAndElevate(CanBeOracleListLock))
{
CanBeOracleList.Remove(which);
}
string name = Thread.CurrentThread.Name ?? "Unnamed thread";
if (which.isMyAddress())
{
throw new VsyncException("[" + name + "] Vsync node " + Vsync.my_address + " detected its own failure(" + howDiscovered + "); shutting down: " + VsyncSystem.GetState());
}
if ((VsyncSystem.Debug & VsyncSystem.FAILURES) != 0)
{
Vsync.WriteLine("[" + name + "] NodeHasFailed " + howDiscovered + " failure discovery event for " + which + ", " + VsyncSystem.GetState());
}
ReliableSender.NodeHasFailed(which);
if (!inhibitReport)
{
int rank = ORACLE.theView.GetRankOf(which);
if (rank != -1)
{
ORACLE.theView.noteFailed(rank);
}
if (Vsync.ClientOf == null)
{
ORACLE.doSend(false, false, Vsync.FDETECTION, which);
}
else if (Vsync.ClientOf != which)
{
ORACLE.doP2PSend(Vsync.ClientOf, true, Vsync.FDETECTION, which);
}
else
{
View v = Vsync.VSYNCMEMBERS.theView;
if (v != null)
{
foreach (Address m in v.members)
{
if (m.isMyAddress())
{
continue;
}
if (!Vsync.IsAlive(m))
{
continue;
}
if ((VsyncSystem.Debug & VsyncSystem.FAILURES) != 0)
{
Vsync.WriteLine("[" + name + "] Sending FDETECTION notification in Vsync.VSYNCMEMBERS to " + m);
}
Vsync.VSYNCMEMBERS.doP2PSend(m, true, Vsync.FDETECTION, which);
}
}
// We need to give the new ORACLE leader time to take over and get back in touch
// This loop will run for 30 seconds, checking 4 times per second
for (int nt = 0; nt < 120; nt++)
{
Vsync.Sleep(250);
if (ORACLE.theView.GetRankOf(which) == -1)
{
return;
}
}
VsyncSystem.Shutdown("Client lost connectivity to core Vsync system");
}
}
}
private static bool IsAlive(Address m)
{
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
if (Vsync.RIPList.Contains(m))
{
return false;
}
}
return true;
}
public static void GroupHasFailed(string why)
{
throw new VsyncException(why);
}
internal static bool BVCompare(byte[] a, byte[] b)
{
if (a.Length != b.Length)
{
return false;
}
for (int i = 0; i < a.Length; i++)
{
if (a[i] != b[i])
{
return false;
}
}
return true;
}
private static bool rentry;
internal static IPAddress setMyAddress()
{
if (rentry)
{
return Vsync.my_IPaddress;
}
rentry = true;
try
{
IDictionary environmentVariables = Environment.GetEnvironmentVariables();
foreach (DictionaryEntry de in environmentVariables)
{
switch ((string)de.Key)
{
case "VSYNC_PORTNO":
VSYNC_GROUPPORT = int.Parse((string)de.Value);
break;
case "VSYNC_SKIP_FIRSTINTERFACE":
VSYNC_SKIP_FIRSTINTERFACE = bool.Parse((string)de.Value);
break;
case "VSYNC_DONT_COMPRESS":
VSYNC_DONT_COMPRESS = bool.Parse((string)de.Value);
break;
case "VSYNC_LOGGED":
VSYNC_LOGGED = bool.Parse((string)de.Value);
break;
case "VSYNC_LARGE":
VSYNC_LARGE = bool.Parse((string)de.Value);
break;
case "VSYNC_MUTE":
VSYNC_MUTE = bool.Parse((string)de.Value);
break;
case "VSYNC_IGNORESMALLPARTITIONS":
VSYNC_IGNORESMALLPARTITIONS = bool.Parse((string)de.Value);
break;
case "VSYNC_IGNOREPARTITIONS":
VSYNC_IGNOREPARTITIONS = bool.Parse((string)de.Value);
break;
case "VSYNC_GCFREQ":
VSYNC_GCFREQ = int.Parse((string)de.Value);
break;
case "VSYNC_MCRANGE_LOW":
VSYNC_MCRANGE_LOW = int.Parse((string)de.Value);
break;
case "VSYNC_MCRANGE_HIGH":
VSYNC_MCRANGE_HIGH = int.Parse((string)de.Value);
break;
case "VSYNC_MAXIPMCADDRS":
VSYNC_MAXIPMCADDRS = int.Parse((string)de.Value);
break;
case "VSYNC_MAXDIRECTSENDS":
VSYNC_MAXDIRECTSENDS = int.Parse((string)de.Value);
break;
case "VSYNC_UNICAST_ONLY":
VSYNC_UNICAST_ONLY = bool.Parse((string)de.Value);
break;
case "VSYNC_OOBVIATCP":
VSYNC_OOBVIATCP = bool.Parse((string)de.Value);
break;
case "VSYNC_UDPCHKSUM":
VSYNC_UDPCHKSUM = bool.Parse((string)de.Value);
break;
case "VSYNC_CANJOINORACLE":
VSYNC_CANJOINORACLE = bool.Parse((string)de.Value);
break;
case "VSYNC_MCMDREPORTRATE":
VSYNC_MCMDREPORTRATE = int.Parse((string)de.Value);
break;
case "VSYNC_LEN_ROUNDUP":
VSYNC_LEN_ROUNDUP = int.Parse((string)de.Value);
break;
case "VSYNC_RATELIM":
VSYNC_RATELIM = int.Parse((string)de.Value);
break;
case "VSYNC_TTL":
VSYNC_TTL = int.Parse((string)de.Value);
break;
case "VSYNC_TOKEN_DELAY":
VSYNC_TOKEN_DELAY = Math.Min(1000, Math.Max(1, int.Parse((string)de.Value)));
break;
case "VSYNC_LOGDIR":
VSYNC_LOGDIR = (string)de.Value;
break;
case "VSYNC_USERDMA":
VSYNC_USERDMA = bool.Parse((string)de.Value);
break;
case "VSYNC_NETWORK_INTERFACES":
VSYNC_NETWORK_INTERFACES = (string)de.Value;
if (VSYNC_NETWORK_INTERFACES.Length == 0)
{
VSYNC_NETWORK_INTERFACES = null;
}
break;
case "VSYNC_HOSTS":
VSYNC_HOSTS = (string)de.Value;
break;
case "VSYNC_NETMASK":
VSYNC_NETMASK = (string)de.Value;
break;
case "VSYNC_SUBNET":
VSYNC_SUBNET = (string)de.Value;
break;
case "VSYNC_GRACEFULSHUTDOWN":
VSYNC_GRACEFULSHUTDOWN = bool.Parse((string)de.Value);
break;
case "VSYNC_PORTNOp":
VSYNC_DEFAULT_PORTNOp = int.Parse((string)de.Value);
VSYNC_DEFAULT_PORTNOa = VSYNC_DEFAULT_PORTNOp + 1;
VSYNC_DEFAULT_PORTNOt = VSYNC_DEFAULT_PORTNOp + 2;
break;
case "VSYNC_PORTNOa":
throw new VsyncException("VSYNC_PORTNOa is automatically determined from VSYNC_PORTNOp and cannot be directly changed");
case "VSYNC_AESKEY":
Group.doInitializeAes(out VSYNC_AES);
VSYNC_AESKEY = byteVecParse((string)de.Value);
VSYNC_SIGS = true;
break;
case "COMPUTERNAME":
case "HOSTNAME":
Vsync.my_host = (string)de.Value;
break;
}
}
if (!string.IsNullOrEmpty(VSYNC_SUBNET) && !string.IsNullOrEmpty(VSYNC_NETMASK))
{
if(!IPAddress.TryParse(VSYNC_SUBNET, out VSYNC_SUBNET_ADDR) || !IPAddress.TryParse(VSYNC_NETMASK, out VSYNC_NETMASK_ADDR))
{
throw new VsyncException("Unable to parse VSYNC_SUBSET or VSYNC_NETMASK");
}
}
if (VSYNC_AES != null)
{
VSYNC_AESSEED = new RNGCryptoServiceProvider();
}
if (VSYNC_SIGS)
{
VSYNC_MSGPADDING = VSYNC_AES == null ? 32 : 48;
}
VSYNC_TOKEN_DELAY = Math.Max(20, (VSYNC_TOKEN_DELAY / 20) * 20);
VSYNC_HOSTS_IPADDRS = ExtractHostIPAddrs(VSYNC_HOSTS);
#if __MonoCS__
Vsync.Sleep(my_pid % 1000); // Hack to avoid triggering some sort of Dns bug, observed only on Mono
#endif
bool fndOne = false;
for (int retry = 0; !fndOne && retry < 24; retry++)
{
if (!VSYNC_UNICAST_ONLY || VSYNC_NETWORK_INTERFACES != null)
{
string[] which = null;
if (VSYNC_NETWORK_INTERFACES != null)
{
which = VSYNC_NETWORK_INTERFACES.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
}
InterfaceIds = ReliableSender.getNetworkInterfaces(which);
if ((VsyncSystem.Debug & VsyncSystem.INTERFACES) != 0)
{
Vsync.Write("VSYNC_NETWORK_INTERFACES = <");
for (int i = 0; i < InterfaceIds.Count; i++)
{
if (i > 0)
{
Vsync.Write(" ");
}
if (i == 0 && VSYNC_SKIP_FIRSTINTERFACE && InterfaceIds.Count > 1)
{
Vsync.Write("SKIPPING[" + InterfaceIds[i] + "]");
}
else
{
Vsync.Write(InterfaceIds[i].ToString(CultureInfo.InvariantCulture));
}
}
Vsync.WriteLine(">");
}
if (VSYNC_SKIP_FIRSTINTERFACE && InterfaceIds.Count > 1)
{
InterfaceIds.RemoveAt(0);
}
}
else
{
ReliableSender.getNetworkInterfaces(null);
}
if (Vsync.my_IPaddress == null)
{
if (retry < 24)
{
Vsync.Sleep(5000);
}
else
{
Vsync.WriteLine("WARNING: No network interface supports IPv4... using `localhost=127.0.0.1' for my IP address (network connectivity will be limited)");
Vsync.my_IPaddress = new IPAddress(new byte[] { 127, 0, 0, 1 });
}
}
else
fndOne = true;
}
if (VSYNC_HOSTS_IPADDRS != null)
{
for (int i = 0; i < VSYNC_HOSTS_IPADDRS.Length; i++)
{
if (VSYNC_HOSTS_IPADDRS[i].Equals(nullIPAddr))
{
VSYNC_HOSTS_IPADDRS[i] = Vsync.my_IPaddress;
}
}
}
// VN - change on Android - storing the log file in the external storage directory in the Android device.
if (VSYNC_LOGGED && !VSYNC_LOG_CREATED)
{
#if __MonoCS__
string directory = "./OUT";
string fname = "./OUT/VSYNC-" + Vsync.my_IPaddress.ToString() + "-" + my_pid + ".log";
#else
string directory = VSYNC_LOGDIR;
string fname = Path.Combine(VSYNC_LOGDIR, "VSYNC-" + my_pid + ".log");
#endif // !__MonoCS__
try
{
Directory.CreateDirectory(directory);
}
catch (Exception)
{
}
try
{
File.Delete(fname);
}
catch (Exception)
{
}
try
{
my_logstream = new FileStream(fname, FileMode.CreateNew);
VSYNC_LOG_CREATED = true;
string rev = "$Rev: 2095 $";
rev = rev.Substring(rev.IndexOf(" ", StringComparison.Ordinal) + 1);
rev = "Vsync V2.2 Revision " + rev.Substring(0, rev.IndexOf(" ", StringComparison.Ordinal));
string settings = "<" + Vsync.my_host + ": " + Vsync.my_IPaddress + ">" + Environment.NewLine + "Major Vsync Runtime Settings: ORACLESIZE = " + VSYNC_ORACLESIZE + "; ";
if (VSYNC_UNICAST_ONLY)
{
settings += "UNICAST_ONLY; ";
}
if (VSYNC_HOSTS.Length > 0)
{
settings += "VSYNC_HOSTS = {" + VSYNC_HOSTS + "}; ";
}
settings += Environment.NewLine + "P2P/ACK port numbers = {" + VSYNC_DEFAULT_PORTNOp + "/" + VSYNC_DEFAULT_PORTNOa + "}; ";
if (!VSYNC_UNICAST_ONLY)
{
settings += "IPMC base portno = " + VSYNC_GROUPPORT + Environment.NewLine + "IPMC address range = {" + MCMDSocket.PMCAddr(CLASSD + VSYNC_MCRANGE_LOW) + "-" + MCMDSocket.PMCAddr(CLASSD + VSYNC_MCRANGE_HIGH) + "}, MAXIPMCADDRS in use = " + VSYNC_MAXIPMCADDRS;
}
if (VSYNC_IGNOREPARTITIONS && VSYNC_ORACLESIZE > 1)
{
settings += Environment.NewLine + "WARNING: You have configured Vsync to ignore network partitioning failures.";
}
if (VSYNC_INFINIBAND)
{
settings += Environment.NewLine + " Using Infiniband. Searching for ib.dll in LD_LIBRARY_PATH={" + Environment.GetEnvironmentVariable("LD_LIBRARY_PATH") + "}";
}
#if __MonoCS__
settings = "; Compiled for MONO; " + settings;
#endif
#if __ANDROID__
settings = "; Compiled for Android; " + settings;
#endif
DateTime localNow = DateTime.UtcNow.ToLocalTime();
TimeSpan uptime = new TimeSpan(Vsync.TICKS);
byte[] header = Msg.StringToBytes(rev + ": " + localNow.ToShortDateString() + " " + localNow.ToLongTimeString() + " (" + uptime + ") " + settings + Environment.NewLine + "---------------------------------------------------------------------------------------------" + Environment.NewLine);
my_logstream.Write(header, 0, header.Length);
}
catch (IOException e)
{
VSYNC_LOGGED = false;
my_logstream = null;
if (VsyncSystem.noConsole)
{
Debug.WriteLine("WARNING: Unable to create Vsync log file: " + fname + "(" + e + ")");
}
else
{
Console.WriteLine("WARNING: Unable to create Vsync log file: " + fname + "(" + e + ")");
}
}
}
if (VSYNC_UNICAST_ONLY && VSYNC_HOSTS_IPADDRS == null)
{
throw new VsyncException("VSYNC: UNICAST_ONLY mode but you didn't specify VSYNC_HOSTS for initial rendezvous!");
}
VSYNC_FRAGLEN = Vsync.VSYNC_MAXMSGLEN - (Vsync.VSYNC_OVERHEAD * 2);
VSYNC_MUSTFRAGMENT = Vsync.VSYNC_MAXMSGLEN - Vsync.VSYNC_OVERHEAD;
Msg.Initialize();
my_address = new Address(Vsync.my_IPaddress, my_pid);
Vsync.WriteLineLog("... MyAddress = " + Vsync.my_address);
ReliableSender.Init();
my_address_bytes = Msg.toBArray(my_address);
Msg.doRegisterType<Vsync.ViewDelta>(Msg.VIEWDELTA);
Msg.doRegisterType<Vsync.UnstableList>(Msg.UNSTABLE);
Msg.doRegisterType<Group.tokenInfo>(Msg.TOKENINFO);
Msg.doRegisterType<Group.FlushAggKey>(Msg.FLSHAGGKEY);
Msg.doRegisterType<MCMDSocket.GRPair>(Msg.GRPAIR);
Msg.doRegisterType<Group.LockInfo>(Msg.LOCKINFO);
Msg.doRegisterType(typeof(QueryKey<>), Msg.QUERYKEY);
Msg.doRegisterType<Group.DHTItem>(Msg.DHTITEM);
Msg.doRegisterType<Group.osspq>(Msg.OSSPQ);
Msg.doRegisterType<Group.OOBRepInfo>(Msg.OOBREPINFO);
Msg.doRegisterType<Group.LockReq>(Msg.LOCKREQ);
if (VSYNC_USERDMA == false)
{
Vsync.VSYNC_INFINIBAND = false; // In this situation, the user told us not to use IB verbs (aka "RDMA")
}
#if !__MonoCS__
if (Vsync.VSYNC_INFINIBAND)
{
Vsync.WriteLine("WARNING: Detected an INFINIBAND device, but will use IP on IB rather than verbs (ib.dll not yet tested on Windows)");
}
Vsync.VSYNC_INFINIBAND = false;
#endif
if (Vsync.VSYNC_INFINIBAND && IB.init())
{
VSYNC_LID = IB.IB_getlid();
if (VSYNC_LID == 0)
{
throw new VsyncException("Vsync on Infiniband: device lid was 0, but this value is not supported");
}
VSYNC_OOBCHUNKSIZE = 1024 * 1024 * 1024;
VSYNC_UNICAST_ONLY = true;
}
else
{
VSYNC_OOBCHUNKSIZE = Math.Max(512, VSYNC_MAXMSGLEN - 2048);
}
new Thread(ReliableSender.TokenThread) { Name = "Vsync token-loop thread", IsBackground = true }.Start();
/*
* The rather tortured logic that follows tries to deal with a wide range of startup scenarios including multicast supported or not,
* Oracle already running or not, several copies launched simultaneously as opposed to one by one, machines lightly or heavily loaded, etc.
* All of this makes for a "mess". Sorry.
*/
ORACLE = new Group("ORACLE");
SetupORACLE();
if (Vsync.WORKER_MODE)
{
return null;
}
int OracleTries = 0;
bool again = true;
while (again)
{
again = false;
if (OracleTries > 0)
{
Vsync.Sleep(OracleTries * 5000);
}
bool cantBeOracle = false;
int theDelay = 1000;
if (!VSYNC_CANJOINORACLE || (!string.IsNullOrEmpty(Vsync.VSYNC_HOSTS) && !IAmInHostList()))
{
cantBeOracle = true;
VSYNC_CANJOINORACLE = false;
Vsync.WriteLineLog("Vsync: This instance can't be the ORACLE, joining as a client");
}
if (VsyncSystem.fastStart)
{
Vsync.WriteLineLog("WARNING: FastStart (skipping search for the Vsync ORACLE)");
}
for (int retry = 0; Vsync.ClientOf == null && !VsyncSystem.fastStart && retry < (cantBeOracle ? 20 : 5) && Vsync.foundOracle == null; retry++)
{
tryToJoin();
if (!cantBeOracle && Vsync.foundOracle == null && retry == 0)
{
Vsync.WriteLineLog("Vsync: Searching for the Vsync ORACLE...");
}
Vsync.Sleep(theDelay);
if (Vsync.ClientOf != null || (ORACLE.theView != null && Vsync.ORACLE.theView.GetMyRank() != -1))
{
break;
}
Vsync.Sleep(theDelay);
}
if (Vsync.foundOracle != null && Vsync.ClientOf == null)
{
int totalTryTime = 0;
Vsync.WriteLog("Vsync: Found the Vsync.ORACLE service, attempting to connect.");
for (int retry = 0; (!ORACLE.HasFirstView || ORACLE.theView.GetMyRank() == -1) && Vsync.ClientOf == null && (retry < 10 || totalTryTime < Vsync.VSYNC_DEFAULTTIMEOUT * 3); retry++)
{
if (Vsync.ClientOf == null)
{
tryToJoin();
}
theDelay = Math.Min(6000, theDelay);
if ((VsyncSystem.Debug & VsyncSystem.STARTSEQ) != 0)
{
Vsync.WriteLineLog("WARNING: Found the ORACLE but it was still restarting... delaying " + theDelay + "ms before retrying join request... " + retry);
}
Vsync.Sleep(theDelay);
totalTryTime += theDelay;
Vsync.WriteLog(".");
theDelay += 500;
}
Vsync.WriteLog(string.Empty);
if (Vsync.ClientOf == null && (!ORACLE.HasFirstView || Vsync.ORACLE.theView.GetMyRank() == -1))
{
throw new VsyncException("Vsync ORACLE is " + Vsync.foundOracle + " but attempt to connect with it failed");
}
}
if (cantBeOracle && Vsync.ClientOf == null)
{
if (++OracleTries < 2)
{
again = true;
continue;
}
if (Vsync.foundOracle == null)
{
throw new VsyncException("I can't be the ORACLE but was unable to contact the ORACLE in VSYNC_UNICAST_ONLY mode");
}
throw new VsyncException("ORACLE is " + Vsync.foundOracle + " but I was unable to connect with it in VSYNC_UNICAST_ONLY mode");
}
}
new Thread(Group.GroupMemberHeartBeat) { Name = "Vsync All-Groups HeartBeat thread", IsBackground = true }.Start();
if (Vsync.ClientOf == null)
{
new Thread(Vsync.ORACLE.OracleHeartBeat) { Name = "Vsync <ORACLE> HeartBeat thread", IsBackground = true }.Start();
if (!ORACLE.HasFirstView && (VsyncSystem.Debug & VsyncSystem.STARTSEQ) != 0)
{
Vsync.WriteLine("Restarting the ORACLE in " + VsyncSystem.GetState());
}
}
return Vsync.my_IPaddress;
}
catch (Exception e)
{
Vsync.WriteLine("VsyncLib: Initialization error <" + e + ">");
VsyncSystem.VsyncActive = false;
}
return null;
}
private const string hex = "0123456789ABCDEF";
private static byte[] byteVecParse(string arg)
{
int idx = 0;
if (arg.Length != (VSYNC_AES.KeySize * 2))
{
throw new VsyncException("VSYNC_AESKEY: argument has incorrect length (should be a " + VSYNC_AES.KeySize + "-byte/" + (VSYNC_AES.KeySize * 8) + "-bit vector, encoded as a hexstring");
}
byte[] bvec = new byte[VSYNC_AES.KeySize];
for (int off = 0; off < bvec.Length; off++)
{
char c1 = arg[idx++];
char c2 = arg[idx++];
bvec[off] = (byte)((hex.IndexOf(c1) << 8) | hex.IndexOf(c2));
}
return bvec;
}
private static readonly IPAddress nullIPAddr = new IPAddress(0L);
private static IPAddress[] ExtractHostIPAddrs(string hlist)
{
if (string.IsNullOrEmpty(hlist))
{
if (Vsync.VSYNC_UNICAST_ONLY)
{
hlist = "localhost";
}
else
{
return null;
}
}
int nContacts = 0;
hlist += ",";
for (int i = 0; i < hlist.Length; i++)
{
if (hlist[i] == ',')
{
nContacts++;
}
}
IPAddress[] newVSYNC_HOSTS = new IPAddress[nContacts];
nContacts = 0;
for (int i = 0; i < hlist.Length; i++)
{
if (hlist[i] == ',')
{
string hname = hlist.Substring(0, i);
if (hname.Equals("localhost", StringComparison.Ordinal) || hname.Equals("0.0.0.0", StringComparison.Ordinal))
{
newVSYNC_HOSTS[nContacts++] = nullIPAddr;
}
else
{
newVSYNC_HOSTS[nContacts++] = LastIPv4(hname);
}
hlist = hlist.Substring(i + 1);
i = 0;
}
}
return newVSYNC_HOSTS;
}
internal static bool IPv4AddressIsAllowed(IPAddress address)
{
if (Vsync.VSYNC_NETMASK_ADDR == null || Vsync.VSYNC_SUBNET_ADDR == null)
{
return true;
}
if (address.AddressFamily == AddressFamily.InterNetwork)
{
// IPv4
byte[] addressOctets = address.GetAddressBytes();
byte[] netmastOctets = Vsync.VSYNC_NETMASK_ADDR.GetAddressBytes();
byte[] subnetOctets = Vsync.VSYNC_SUBNET_ADDR.GetAddressBytes();
return (subnetOctets[0] == (addressOctets[0] & netmastOctets[0])) && (subnetOctets[1] == (addressOctets[1] & netmastOctets[1])) && (subnetOctets[2] == (addressOctets[2] & netmastOctets[2])) && (subnetOctets[3] == (addressOctets[3] & netmastOctets[3]));
}
// IPv6
return false;
}
private static readonly Dictionary<string, IPAddress> IPv4Map = new Dictionary<string, IPAddress>();
private static readonly LockObject IPv4MapLock = new LockObject("IPv4MapLock");
internal static IPAddress LastIPv4(string hname)
{
IPAddress theIPAddr;
using (var tmpLockObj = new LockAndElevate(IPv4MapLock))
{
if (IPv4Map.ContainsKey(hname))
{
return IPv4Map[hname];
}
theIPAddr = _LastIPv4(hname);
if (theIPAddr != null)
{
IPv4Map.Add(hname, theIPAddr);
}
}
return theIPAddr;
}
internal static IPAddress _LastIPv4(string hname)
{
IPAddress rval;
IPAddress[] list;
if (IPAddress.TryParse(hname, out rval))
{
return rval;
}
try
{
#if __MonoCS__
Vsync.Sleep((Vsync.my_pid % 1000) * 3);
#endif
list = Dns.GetHostAddresses(hname);
}
catch (SocketException)
{
Vsync.WriteLine("Warning: IP Address or hostname <" + hname + "> is unknown");
return new IPAddress(0);
}
foreach (IPAddress a in list)
{
if (!string.IsNullOrEmpty(hname) && !IPAddress.TryParse(hname, out rval))
{
if (IPv4AddressIsAllowed(a))
{
rval = a;
}
}
else
{
if (a.AddressFamily == AddressFamily.InterNetwork)
{
rval = a; // Picks last in the list
}
}
}
if (rval != null)
{
return rval;
}
throw new VsyncException("IPv6 support: not yet implemented");
}
private static int tryCount;
private static void tryToJoin()
{
if ((VsyncSystem.Debug & VsyncSystem.STARTSEQ) != 0)
{
Vsync.WriteLine("Sending the Oracle a JOIN request -- myAddress " + Vsync.my_address + (Vsync.foundOracle == null ? string.Empty : ", Oracle is " + Vsync.foundOracle));
}
int mode = Group.CREATE | Group.JOIN;
if (VSYNC_CANJOINORACLE)
{
mode |= Group.CANBEORACLE;
}
if (Vsync.foundOracle == null || ++tryCount % 2 == 0)
{
ORACLE.doSend(false, false, Vsync.JOIN, Vsync.my_address, mode, new[] { "ORACLE" }, new[] { ORACLE.gaddr }, 0L, new[] { 0L }, new[] { 0 }, ++VsyncSystem.VsyncJoinCounter);
}
else
{
ORACLE.doP2PSend(Vsync.foundOracle, true, Vsync.JOIN, Vsync.my_address, mode, new[] { "ORACLE" }, new[] { ORACLE.gaddr }, 0L, new[] { 0L }, new[] { 0 }, ++VsyncSystem.VsyncJoinCounter);
}
}
private static bool IAmInHostList()
{
return VSYNC_HOSTS_IPADDRS == null || (VSYNC_MY_IPADDRS == null ? VSYNC_HOSTS_IPADDRS.Contains(my_address.home) : VSYNC_HOSTS_IPADDRS.Intersect(VSYNC_MY_IPADDRS).Any());
}
internal static int pingCntr;
internal static bool IWasLeader = false;
internal static void SetupIM()
{
Vsync.VSYNCMEMBERS = new Group("VSYNCMEMBERS");
if (Vsync.VSYNC_LARGE)
{
Vsync.VSYNCMEMBERS.SetLarge();
}
MCMDSocket.Setup(Vsync.VSYNCMEMBERS);
VSYNCMEMBERS.ViewHandlers *= v =>
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine(v.ToString());
}
Vsync.VSYNC_ASYNCMTOTALLIMIT = Math.Min(Vsync.VSYNC_MINASYNCMTOTAL, Math.Max(2, 1250 / v.members.Length));
List<Group> gc = Group.VsyncGroupsClone();
foreach (Address who in v.leavers)
{
ReliableSender.NodeHasFailed(who);
foreach (Group g in gc)
{
Group.GroupNoteFailure(g, who);
}
}
if (v.IAmLeader())
{
if (v.viewid == 0)
{
IWasLeader = true;
}
else if (!IWasLeader)
{
IWasLeader = true;
AnnounceMCMDMapping();
}
}
VSYNCMEMBERS.OOBNewView(v);
};
VSYNCMEMBERS.doRegister(Vsync.FDETECTION, new Action<Address>(who => NodeHasFailed(who, "Informed by some remote caller", true)));
VSYNCMEMBERS.doRegister(Vsync.INQUIRE, new Action(() =>
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
while (VsyncSystem.VsyncActive)
{
if (VsyncSystem.waitForWorkerSetup.WaitOne(120000))
{
VsyncSystem.RTS.ThreadCntrs[3]++;
VSYNCMEMBERS.doNullReply();
return;
}
}
}));
VSYNCMEMBERS.doRegister(Vsync.INQUIRE, new Action<Address, int>((gaddr, vid) => VSYNCMEMBERS.doReply(ReliableSender.getMinStable(gaddr, vid))));
VSYNCMEMBERS.doRegister(Vsync.BECLIENT, new Action<string>(gname => VSYNCMEMBERS.doReply(Client.GetTSigs(gname))));
VSYNCMEMBERS.doRegister(Vsync.BECLIENT, new Action<string, Address>(Client.ResetRep));
VSYNCMEMBERS.doRegister(Vsync.PING, new Action<string>(message =>
{
Vsync.WriteLine("PING: [" + Vsync.MsToSecs(Vsync.NOW) + "]: Received <" + message + "> in view " + Vsync.VSYNCMEMBERS.theView.viewid);
pingCntr++;
}));
VSYNCMEMBERS.doRegister(Vsync.PING, new Action(() => { }));
VSYNCMEMBERS.doRegister(Vsync.IBADDRS, new Action<Address, Address[], int[], int[], int>((who, rmembers, p2pqps, ackqps, lid) =>
{
for (int r = 0; r < rmembers.Length; r++)
{
if (rmembers[r].isMyAddress())
{
IB.noteRemoteIB(who, rmembers, p2pqps[r], ackqps[r], lid);
break;
}
}
}));
VSYNCMEMBERS.doRegister(Vsync.IBADDRS, new Action<Address, bool>(IB.noteRemoteIB));
VSYNCMEMBERS.SetupIMTunnels();
if (Vsync.WORKER_MODE)
{
return;
}
if (Vsync.IAmOracle && ORACLE.theView.IAmLeader())
{
VSYNCMEMBERS.Create();
}
else
{
VSYNCMEMBERS.Join();
}
}
internal static void AnnounceMCMDMapping()
{
if (Vsync.VSYNCMEMBERS == null || !Vsync.VSYNCMEMBERS.HasFirstView || !Vsync.VSYNCMEMBERS.theView.IAmLeader())
{
return;
}
if ((VsyncSystem.Debug & VsyncSystem.MCMDMAP) != 0)
{
Vsync.WriteLine("Broadcasting new MCMD Mapping!");
}
int[,] theMap = MCMDSocket.MCMDvirtual.GetMapAll();
if (theMap != null)
{
Vsync.VSYNCMEMBERS.doSend(false, false, Vsync.REMAP, Vsync.MapperEpochId, MCMDSocket.nextPhysIPAddr, theMap);
}
}
internal class PendingLeaderOps
{
// Note that depending on which constructor is used, the DAL logic can handle reliable message forwarding OR at-most-once execution by a leader
// Important to keep track of which case we're dealing with in the DAL code or various havoc can ensue....
internal Group group;
internal Address Sender;
internal Msg reqMsg;
internal Msg replyMsg;
internal int uid;
internal ThreadStart doTheAction;
internal PendingLeaderOps(Group g, Address s, int u, ThreadStart d)
{
this.group = g;
this.Sender = s;
this.uid = u;
this.doTheAction = d;
this.reqMsg = g.getReplyToAndClear();
}
internal PendingLeaderOps(Group g, Address s, int u, Msg rqmsg, Msg rep)
{
this.group = g;
this.Sender = s;
this.uid = u;
this.reqMsg = rqmsg;
this.replyMsg = rep;
}
}
internal static LockObject PendingLeaderOpsLock = new LockObject("PendingLeanderOpsLock");
internal static List<PendingLeaderOps> PendingLeaderOpsList = new List<PendingLeaderOps>();
internal static string GetPLLState()
{
using (var tmpLockObj = new LockAndElevate(PendingLeaderOpsLock))
{
if (PendingLeaderOpsList.Count == 0)
{
return string.Empty;
}
string s = "List of callbacks for DoAsLeader requests:" + Environment.NewLine;
foreach (PendingLeaderOps plo in PendingLeaderOpsList)
{
s += " Group <" + plo.group.gname + ">, Sender " + plo.Sender + ", uid=" + plo.uid + Environment.NewLine;
}
return s;
}
}
internal static void PendingLeaderViewChange(View v)
{
if (!ORACLE.LeaderMode && v.IAmLeader())
{
TakeOverAsOracle();
}
// If someone gets added to the ORACLE, make sure to delete them from the CanBeOracleList
// Obviously the leader won't find them on the list, but the other group members will
// This way when we look for a candidate to replace a departing ORACLE member, we won't get
// confused (in the VUProtocol) by entries that aren't actually valid candidates
if (v.joiners.Length > 0)
{
using (var tmpLockObj = new LockAndElevate(CanBeOracleListLock))
{
foreach (Address who in v.joiners)
{
CanBeOracleList.Remove(who);
}
}
}
if (!v.IAmLeader())
{
return;
}
List<PendingLeaderOps> callbackList = new List<PendingLeaderOps>();
using (var tmpLockObj = new LockAndElevate(PendingLeaderOpsLock))
{
List<PendingLeaderOps> newPendingLeaderOpsList = new List<PendingLeaderOps>();
foreach (PendingLeaderOps outerPlo in PendingLeaderOpsList)
{
if (outerPlo.group.gaddr != v.gaddr)
{
newPendingLeaderOpsList.Add(outerPlo);
}
else if (outerPlo.replyMsg != null)
{
callbackList.Add(outerPlo);
}
else if (outerPlo.replyMsg != null)
{
PendingLeaderOps plo = outerPlo;
new Thread(() =>
{
try
{
Msg.InvokeFromBArray(plo.replyMsg.payload, new Action<Address, byte, Address, byte[], int, int>((gaddr, type, destProc, buffer, vid, MsgID) =>
{
Group g = Group.doLookup(gaddr);
if (g != null)
{
ReliableSender.doSend(false, ReliableSender.my_p2psocket, g, type, destProc, buffer, vid, MsgID, true, null);
}
plo.group.doSend(false, false, DALDONE, plo.group.gaddr, plo.Sender, plo.uid);
}));
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "DAL Worker thread spawned in PendingLeaderViewChange", IsBackground = true }.Start();
}
}
PendingLeaderOpsList = newPendingLeaderOpsList;
}
foreach (PendingLeaderOps outerPlo in callbackList)
{
PendingLeaderOps plo = outerPlo;
new Thread(() =>
{
try
{
Group g = Group.doLookup(plo.group.gaddr);
if (g == null)
{
throw new VsyncException("DAL callback: group not found");
}
g.setReplyTo(plo.reqMsg);
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Msg replyTo = plo.reqMsg;
Vsync.WriteLine("DAL callback: In group <" + g.gname + ">, Thread DoAsLeaderCallback, setting theMsg=" + replyTo.vid + ":" + replyTo.msgid + ", needsreply=" + ((replyTo.flags & Msg.NEEDSREPLY) != 0));
}
if (plo.doTheAction != null)
{
plo.doTheAction();
}
g.clearReplyTo();
g.doUnorderedSend(DALDONE, g.gaddr, plo.Sender, plo.uid);
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "DoAsLeader Callback", IsBackground = true }.Start();
}
}
internal static void DALReplyNotify(Group g, Msg rmsg, PendingLeaderOps plos, Msg replyTo)
{
using (var tmpLockObj = new LockAndElevate(g.groupLock))
{
g.NotifyDALOnReply = null;
}
byte f = replyTo.flags;
replyTo.flags |= Msg.NEEDSREPLY;
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DALReplyNotify: Doing an unordered multicast to DALdone for " + plos.group.gname + ", rqsender=" + plos.Sender + ", uid=" + plos.uid + ", rmsg=" + rmsg.sender + "::" + rmsg.vid + ":" + rmsg.msgid);
}
if (g != ORACLE)
{
ORACLE.doUnorderedQueryToBA(Group.ALL, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT, Timeout.TO_FAILURE, "DALDONE"), DALDONE, Vsync.my_address, plos.group.gaddr, plos.Sender, plos.uid, replyTo, rmsg, true);
}
else
{
ORACLE.doSend(false, false, DALDONE, Vsync.my_address, plos.group.gaddr, plos.Sender, plos.uid, replyTo, rmsg, false);
}
replyTo.flags = f;
}
// Should be called by all members of a group upon receipt of a virtually synchronous multicast containing somne request
private static void DoAsLeader(Group g, Address Sender, int uid, ThreadStart theAction)
{
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DoAsLeader(<" + g.gname + ">, IAmLeader=" + g.theView.IAmLeader() + ", sender=" + Sender + ", uid=" + uid + ", action=" + theAction.Method + ")");
}
PendingLeaderOps plos = new PendingLeaderOps(g, Sender, uid, theAction);
if (!g.theView.IAmLeader())
{
using (var tmpLockObj = new LockAndElevate(PendingLeaderOpsLock))
{
foreach (PendingLeaderOps plo in PendingLeaderOpsList)
{
if (plo.group.gaddr == g.gaddr && plo.Sender == Sender && plo.uid == uid)
{
if (plo.replyMsg != null)
{
return;
}
Vsync.WriteLine("WARNING: DoAsLeader was called twice for the identical request! Gaddr " + g.gaddr + ", Sender " + Sender + ", UID " + uid);
}
}
PendingLeaderOpsList.Add(plos);
return;
}
}
using (var tmpLockObj = new LockAndElevate(ORACLE.groupLock))
{
ORACLE.NotifyDALOnReply = plos;
}
if (plos.reqMsg != null)
{
g.setReplyTo(plos.reqMsg);
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DAL: In group <" + g.gname + ">, setting theMsg=" + plos.reqMsg.vid + ":" + plos.reqMsg.msgid + ", needsreply=" + ((plos.reqMsg.flags & Msg.NEEDSREPLY) != 0));
}
}
plos.doTheAction();
if (plos.reqMsg != null && (plos.reqMsg.flags & Msg.NEEDSREPLY) != 0 && g.getReplyTo() != null)
{
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DoAsLeader: Sending NullReply() because the dal action routine failed to send a reply");
}
g.NullReply();
}
g.clearReplyTo();
// Notice the (legitimate) race condition here. If the DoAsLeader logic doesn't send a reply, and JOIN doesn't do so,
// then if the leader finishes the operation but then fails before this next Send occurs, the request will be repeated
// by the new leader. JOIN happens to be idempotent, which solves this particular problem. If a request isn't
// idempotent it really MUST have a requested reply, even if the reply is just "OK" and will be ignored
g.doUnorderedSend(DALDONE, g.gaddr, Sender, uid);
}
internal static int OracleJoinsUnderway;
internal static LockObject CanBeOracleListLock = new LockObject("CanBeOracleListLock");
internal static List<Address> CanBeOracleList = new List<Address>();
internal static List<Address> FDRunning = new List<Address>();
internal static void SetupORACLE()
{
ORACLE.RegisterViewHandler(PendingLeaderViewChange);
ORACLE.doRegister(DALDONE, new Action<Address, Address, Address, int, Msg, Msg, bool>((sender, gaddr, rqInitiator, uid, rqmsg, rmsg, needsReply) =>
{
if (sender.isMyAddress())
{
return;
}
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DALdone: Received a DAL done(1) event from " + sender + " for " + gaddr + ", sender=" + rqInitiator + ", uid=" + uid);
}
using (var tmpLockObj = new LockAndElevate(PendingLeaderOpsLock))
{
bool fnd = false;
foreach (PendingLeaderOps plo in PendingLeaderOpsList)
{
if (plo.group.gaddr == gaddr && plo.Sender == rqInitiator && plo.uid == uid)
{
plo.replyMsg = rmsg;
fnd = true;
break;
}
}
if (!fnd)
{
// Occurs if there is a race between the original request to the ORACLE and the Reply by the initial leader
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DoAsLeader(<ORACLE>, sender=" + sender + ", uid=" + uid + "), got a DONE message but didn't find a PLOS record, create one");
}
PendingLeaderOpsList.Add(new PendingLeaderOps(ORACLE, rqInitiator, uid, rqmsg, rmsg));
}
}
if (needsReply)
{
ORACLE.doReply("OK");
}
}));
ORACLE.doRegister(DALDONE, new Action<Address, Address, int>((gaddr, sender, uid) =>
{
if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DALdone(2): sender " + sender + " gaddr " + gaddr + ", uid=" + uid);
}
using (var tmpLockObj = new LockAndElevate(PendingLeaderOpsLock))
{
List<PendingLeaderOps> newPendingLeaderOpsList = new List<PendingLeaderOps>();
foreach (PendingLeaderOps plo in PendingLeaderOpsList)
{
if (plo.group.gaddr != gaddr || plo.Sender != sender || plo.uid != uid)
{
newPendingLeaderOpsList.Add(plo);
}
else if ((VsyncSystem.Debug & VsyncSystem.DALLOGIC) != 0)
{
Vsync.WriteLine("DALdone(2): Remove DAL record for " + sender + " for " + gaddr + ", uid=" + uid);
}
}
PendingLeaderOpsList = newPendingLeaderOpsList;
}
}));
ORACLE.doRegister(JOIN, new Action<Address>(who =>
{
if ((VsyncSystem.Debug & VsyncSystem.STARTSEQ) != 0)
{
Vsync.WriteLine("Heard from the ORACLE, his address is " + who);
}
Vsync.foundOracle = who;
// This actually prevents ME from dinging HIM as faulty
ReliableSender.nodeInStartup(who);
}));
ORACLE.doRegister(ORACLERUNNING, new Action<Address>(who =>
{
Vsync.WriteLine("ORACLERUNNING: " + who);
View theView;
using (var tmpLockObj = new LockAndElevate(ORACLE.ViewLock))
theView = ORACLE.theView;
if (!VsyncSystem.VsyncActive || !ORACLE.IAmLeader() || my_address.CompareTo(who) <= 0 || theView == null || theView.leavers.Contains(who))
{
return;
}
if (VSYNCMEMBERS.HasFirstView)
{
VSYNCMEMBERS.doSend(false, false, PARTITIONED, who);
Vsync.Sleep(1000);
throw new VsyncException("ORACLE: Discovered I am in a minority partition" + VsyncSystem.GetState());
}
}));
ORACLE.doRegister(JOIN, new Action<Address, int, string[], Address[], long, long[], int[], int>((who, mode, gnames, gaddrs, offset, tsigs, flags, uid) =>
{
// JOIN actually uses a different scheme to ensure fault-tolerance, implemented by the VUProtocol
// In future work I really should merge these into the DoAsLeader pattern but there are some subtle
// issues because of the way that group view events in the ORACLE itself need to be handled
if (gaddrs.Length == 1 && gaddrs[0] == ORACLE.gaddr && ORACLE.theView != null)
{
if (ORACLE.theView.GetRawRankOf(who) != -1)
{
// Ignore artifacts of the start sequence, which can involve asking to join multiple times
return;
}
if (ORACLE.theView.GetMyRank() != -1)
{
// Inhibit creation of a second oracle, in case leader is slow to respond
if (!who.isMyAddress())
{
if ((VsyncSystem.Debug & VsyncSystem.STARTSEQ) != 0)
{
Vsync.WriteLine("Received a JOIN inquiry in ORACLE, sending a message to inhibit creation of new ORACLEs to " + who);
}
Vsync.ORACLE.doPureP2PSend(who, true, Vsync.JOIN, Vsync.my_address);
// Deals with sluggish C# class loader, which can lock out I/O for many seconds at a time during startup
ReliableSender.nodeInStartup(who);
}
}
}
if ((VsyncSystem.Debug & (VsyncSystem.STARTSEQ | VsyncSystem.GROUPEVENTS | VsyncSystem.VIEWCHANGE)) != 0)
{
string gns = string.Empty, isls = " ";
foreach (string s in gnames)
{
gns += " " + s;
}
foreach (int f in flags)
{
isls += f + " ";
}
Vsync.WriteLine("Oracle received a JOIN <" + gns + " >, gaddrs=" + Address.VectorToString(gaddrs) + ", flags={" + isls + "}, request... my_address " + Vsync.my_address + ", joiner address " + who);
}
if (gnames.Length == 1 && gnames[0].Equals("ORACLE", StringComparison.Ordinal))
{
if (who.isMyAddress())
{
return;
}
using (var tmpLockObj = new LockAndElevate(recent_inquiries_lock))
{
foreach (Address a in recent_inquiries)
{
if (a == who)
{
return;
}
}
recent_inquiries.Add(who);
{
// Using...
Address ri = who;
Vsync.OnTimer(5000, () =>
{
using (var tmpLockObj1 = new LockAndElevate(recent_inquiries_lock))
{
recent_inquiries.Remove(ri);
}
});
}
}
bool coreOracleJoiner = false;
bool isACandidate = false;
using (var tmpLockObj = new LockAndElevate(Vsync.ORACLE.groupLock))
{
if (ORACLE.HasFirstView && (mode & Group.CANBEORACLE) != 0)
{
if (ORACLE.theView.nLive() < VSYNC_ORACLESIZE - OracleJoinsUnderway)
{
// If we get here, we're currently short on ORACLE members and had better do something about it
if (VSYNC_HOSTS_IPADDRS == null)
{
++OracleJoinsUnderway;
coreOracleJoiner = true;
}
else
{
foreach (IPAddress ipa in VSYNC_HOSTS_IPADDRS)
{
if (ipa.Equals(who.home))
{
++OracleJoinsUnderway;
coreOracleJoiner = true;
break;
}
}
}
}
else if (ORACLE.IAmLeader())
{
isACandidate = true;
}
}
}
if (isACandidate)
{
// Pass the word: this is a candidate to join the oracle if the need ever arises
ORACLE.Send(Vsync.CANBEORACLE, who);
}
if (!coreOracleJoiner)
{
if (!ORACLE.IAmLeader())
{
return;
}
SendInitialOracleLeaderInfo(who, Vsync.my_address);
return;
}
}
if ((VsyncSystem.Debug & (VsyncSystem.STARTSEQ | VsyncSystem.GROUPEVENTS)) != 0)
{
Vsync.WriteLine("Initiating VUProtocol in ORACLE.Join");
}
bool fnd = false;
foreach (string gn in gnames)
{
if (Group.TrackingProxyLookup(gn) != null)
{
fnd = true;
break;
}
}
if (!fnd)
{
foreach (Address ga in gaddrs)
{
if (Group.TrackingProxyLookup(ga) != null)
{
fnd = true;
break;
}
}
}
if (fnd)
{
new Thread(() =>
{
try
{
MCMDSocket.InitializeMap(who, gnames, gaddrs);
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "MCMDSocket.InitializeMap", IsBackground = true }.Start();
}
VUProtocol(JOIN, who, mode, gnames, gaddrs, offset, tsigs, flags, uid);
}));
ORACLE.RegisterHandler(Vsync.CANBEORACLE, new Action<Address>(who =>
{
using (var tmpLockObj = new LockAndElevate(CanBeOracleListLock))
{
if (!CanBeOracleList.Contains(who))
{
CanBeOracleList.Add(who);
}
}
}));
ORACLE.doRegister(LEAVE, new Action<Address, int, string[], Address[], int[], int>((who, mode, gnames, gaddrs, flags, uid) =>
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("Initiating VUProtocol in ORACLE.Leave");
}
VUProtocol(LEAVE, who, mode, gnames, gaddrs, 0L, null, flags, uid);
}));
ORACLE.doRegister(TERMINATE, new Action<Address, int, Address[]>((who, uid, gaddrs) =>
{
List<Group> glist = new List<Group>();
List<Group> lglist = new List<Group>();
foreach (Address a in gaddrs)
{
Group g = Group.TrackingProxyLookup(a);
if (g == null)
{
continue;
}
if ((g.flags & Group.G_ISLARGE) == 0)
{
glist.Add(g);
}
else
{
lglist.Add(g);
}
}
if (glist.Count != 0)
{
Group.doMultiSend(glist, true, TERMINATE);
}
if (lglist.Count != 0)
{
foreach (Group g in lglist)
{
g.P2PSend(g.theView.members[0], Vsync.RELAYSEND, new Msg(TERMINATE));
}
}
foreach (Address a in gaddrs)
{
Group g = Group.TrackingProxyLookup(a);
if (g == null)
{
continue;
}
using (var tmpLockObj = new LockAndElevate(Group.TPGroupsLock))
{
Group.TPGroups.Remove(g.gaddr);
}
}
}));
ORACLE.doRegister(Vsync.RELAYJOIN, new Action<Address, int, int, string[], Address[], long, long[], int[]>((joiner, uid, mode, gnames, gaddrs, offset, tsigs, flags) => Vsync.DoAsLeader(ORACLE, joiner, uid, () =>
{
if ((VsyncSystem.Debug & VsyncSystem.RELAYLOGIC) != 0)
{
Vsync.WriteLine("In DAL for Vsync.RELAYJOIN: sender " + joiner + " vectors of length " + gnames.Length);
for (int n = 0; n < gnames.Length; n++)
{
Vsync.WriteLine(" gname " + gnames[n] + ", gaddrs " + gaddrs[n] + ", gsigs " + tsigs[n]);
}
}
ORACLE.doSend(false, false, Vsync.JOIN, joiner, mode, gnames, gaddrs, offset, tsigs, flags, ++VsyncSystem.VsyncJoinCounter);
ORACLE.doReply("OK");
})));
ORACLE.doRegister(Vsync.RELAYLEAVE, new Action<Address, int, string[], Address[], int[]>((sender, uid, gnames, gaddrs, flags) => Vsync.DoAsLeader(ORACLE, sender, uid, () =>
{
if ((VsyncSystem.Debug & VsyncSystem.RELAYLOGIC) != 0)
{
Vsync.WriteLine("In Vsync.RELAYLEAVE: sender " + sender + " vectors of length " + gnames.Length);
for (int n = 0; n < gnames.Length; n++)
{
Vsync.WriteLine(" gname " + gnames[n] + ", gaddrs " + gaddrs[n]);
}
}
ORACLE.doSend(false, false, Vsync.LEAVE, sender, 0, gnames, gaddrs, flags, ++VsyncSystem.VsyncJoinCounter);
ORACLE.doReply("OK");
})));
ORACLE.doRegister(Vsync.RELAYTERM, new Action<Address, int, Address[]>((who, uid, gaddrs) => Vsync.DoAsLeader(ORACLE, who, uid, () =>
{
ORACLE.doSend(false, false, Vsync.TERMINATE, Vsync.my_address, ORACLE.uids++, gaddrs);
ORACLE.doReply("OK");
})));
ORACLE.doRegister(Vsync.ISSTABLE, new Action<Address, int>(Vsync.clearOldVDS));
ORACLE.doRegister(FDETECTION, new Action<Address>(who =>
{
// Another special situation since failure may have caused the prior leader to crash
// Todo: Look for a way to fold this into the DoAsLeader pattern used above
// For now the needed pattern is implemented separately in the VUProtocol logic
List<Group> glist = new List<Group>(), theClone = Group.VsyncAllGroupsClone(false);
bool OracleMemberFailed = false;
bool IwasOldLeader = ORACLE.LeaderMode;
bool IamNewLeader = false;
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.FAILURES)) != 0)
{
Vsync.WriteLine("FDETECTION message received for " + who + "(I was old leader: " + IwasOldLeader + ")");
}
if (who.isMyAddress())
{
VsyncSystem.GotPoison("Failure detection broadcast reported my demise");
}
if (!VsyncSystem.VsyncActive || VsyncSystem.VsyncRestarting || VSYNCMEMBERS == null || !VSYNCMEMBERS.HasFirstView)
{
return;
}
ReliableSender.NodeHasFailed(who);
foreach (Group g in theClone)
{
using (new ILock(ILock.LLBRIEF, g.gaddr))
{
if (g.theView == null || g.theView.GetRawRankOf(who) == -1)
{
continue;
}
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.FAILURES)) != 0)
{
Vsync.WriteLine("FDETECTION(Vsync_Groups) calling View.NoteFailed in <" + g.gname + "> for " + who);
}
View.noteFailed(g, who);
if (g == ORACLE)
{
OracleMemberFailed = true;
if (!IwasOldLeader && g.theView.IAmLeader())
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("I am new leader!");
}
IamNewLeader = true;
}
else if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("In scan of g.ORACLE I am NOT the new leader because IwasOldLeader=" + IwasOldLeader + " and g.theView.IAmLeader=" + g.theView.IAmLeader() + " in view " + g.theView);
}
}
else
{
glist.Add(g);
}
}
}
if (Vsync.ClientOf == null && ORACLE.IAmLeader())
{
View v;
using (var tmpLockObj = new LockAndElevate(ORACLE.ViewLock))
{
v = ORACLE.theView;
}
// Break any wait states within the ORACLE itself. This avoids deadlock
foreach (Address a in v.members)
{
ORACLE.doP2PSend(a, true, FANNOUNCE, who);
}
}
// Now send a virtually synchronous announcement but do it in a separate thread
new Thread(() =>
{
try
{
VSYNCMEMBERS.Send(FANNOUNCE, who);
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "Send FANNOUNCE " + who, IsBackground = true }.Start();
if (Vsync.ClientOf != null)
{
return;
}
// Code run only by members of the ORACLE group
if (OracleMemberFailed)
{
// Handled separately because it may trigger an immediate add of someone, and we don't want to add that same process
// to any other groups that might have failed and would be listed in glist. So we break it into two events.
VUProtocol(LEAVE, who, 0, new string[1] { ORACLE.gname}, new Address[1] { ORACLE.gaddr }, 0L, null, null, -1);
}
if (glist.Count > 0)
{
VUProtocol(LEAVE, who, 0, glist.Select(g => g.gname).Distinct(StringComparer.Ordinal).ToArray(), glist.Select(g => g.gaddr).Distinct().ToArray(), 0L, null, null, -1);
}
if (!IamNewLeader || OracleViewTaskRunning || Vsync.RunningLeaderLogic)
{
return;
}
Vsync.RunningLeaderLogic = true;
Vsync.LeaderId += ORACLE.theView.GetMyRank();
ReliableSender.SendPoison(who, Vsync.my_address + " believes that you have failed");
new Thread(() =>
{
try
{
Dictionary<Address, bool> OnceAndFutureOracleMembers = new Dictionary<Address, bool>();
foreach (Address a in ORACLE.theView.members)
{
OnceAndFutureOracleMembers.Add(a, false);
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Sending INQUIRE message...");
}
ORACLE.theView.isFinal = false;
List<Address> mustContact = new List<Address>();
int scannedTo = -1;
ORACLE.doQueryInvoke(Group.ALL, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT * 20, Timeout.TO_FAILURE, "INQUIRE"), INQUIRE, my_address, new MergeProposals((from, hisLeader, vds) => doMergeProposals(from, hisLeader, vds, OnceAndFutureOracleMembers, mustContact, ref scannedTo)));
while (mustContact.Count > 0)
{
List<Address> contacting = mustContact;
mustContact = new List<Address>();
foreach (Address a in contacting)
{
List<Address> whoReplied = new List<Address>();
List<Address> hisLeader = new List<Address>();
List<ViewDelta[]> vds = new List<ViewDelta[]>();
VSYNCMEMBERS.doP2PQuery(a, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT * 20, Timeout.TO_FAILURE, "INQUIRE"), INQUIRE, my_address, Group.EOL, whoReplied, hisLeader, vds);
if (whoReplied.Count > 0)
{
doMergeProposals(whoReplied.ToArray(), hisLeader.ToArray(), vds.ToArray(), OnceAndFutureOracleMembers, mustContact, ref scannedTo);
}
}
}
int rCnt = 0;
foreach (KeyValuePair<Address, bool> kvp in OnceAndFutureOracleMembers)
{
if (kvp.Value)
{
++rCnt;
}
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
string resps = " ";
foreach (KeyValuePair<Address, bool> kvp in OnceAndFutureOracleMembers)
{
resps += "<" + kvp.Key + "::" + kvp.Value + "> ";
}
Vsync.WriteLine("After INQUIRE: Contacted " + resps + " total of " + rCnt + " replies, needed " + ((OnceAndFutureOracleMembers.Count + 1) / 2));
Vsync.WriteLine("After INQUIRE proposed contains...");
if (Vsync.Proposed == null)
{
Vsync.WriteLine(" ... Vsync.proposed is null");
}
else
{
for (int p = 0; p < Vsync.Proposed.Length; p++)
{
Vsync.WriteLine("PROPOSAL[" + p + "]=" + Vsync.Proposed[p]);
}
}
Vsync.WriteLine("Starting the OracleViewTask thread");
}
if (rCnt < (OnceAndFutureOracleMembers.Count + 1) / 2)
{
throw new VsyncException("New ORACLE leader was unable to contact a quorum of once and future ORACLE members");
}
// Runs only in the current leader... and now, that's me!
TakeOverAsOracle();
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { IsBackground = true }.Start();
}));
ORACLE.doRegister(BECLIENT, new Action<string>(gname => ORACLE.doReply(Client.SelectHisRep(gname))));
ORACLE.doRegister(RELAYREGISTERVG, new Action<Address, Address, Address[]>((sender, nVGA, members) => ORACLE.doSend(false, false, REGISTERVG, sender, nVGA, members)));
ORACLE.doRegister(REGISTERVG, new Action<Address, Address, Address[]>(Group.noteVGMap));
ORACLE.GroupOpen = ORACLE.WasOpen = true;
ReliableSender.StartGroupReader(ORACLE);
}
private static void TakeOverAsOracle()
{
if (VSYNCMEMBERS == null)
{
return;
}
ORACLE.LeaderMode = true;
ORACLE.TakingOver = true;
Vsync.IAmOracle = true;
Vsync.OracleViewThread = new Thread(Vsync.OracleViewTask) { Name = "Vsync <ORACLE> View Thread", IsBackground = true };
Vsync.OracleViewThread.Start();
MCMDSocket.RunMappingTask();
VSYNCMEMBERS.Send(BECLIENT, Vsync.my_address);
}
// As we INQUIRE we learn about new proposals, which we merge into the existing list. We may also encounter added group members
// Handles the so-called "dueling leaders" situation, in which there are two proposals for the same slot
internal static void doMergeProposals(Address[] from, Address[] hisLeader, ViewDelta[][] vds, Dictionary<Address, bool> OnceAndFutureOracleMembers, List<Address> mustContact, ref int scannedTo)
{
ViewDelta[] proposed = Vsync.Proposed ?? new ViewDelta[0];
for (int v = 0; v < from.Length; v++)
{
if (OnceAndFutureOracleMembers.ContainsKey(from[v]))
{
OnceAndFutureOracleMembers[from[v]] = true;
}
else
{
throw new VsyncException("Unexpected ORACLE INQUIRY response: from " + from[v] + " but I didn't think I had contacted him!");
}
ViewDelta[] vd = vds[v];
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Received INQUIRE replies...");
for (int vv = 0; vv < vd.Length; vv++)
{
Vsync.WriteLine("REPLY[" + v + ":" + vv + "]=" + vd[vv]);
}
}
bool useProposal = false;
bool ignoreIt = false;
int minLen = Math.Min(vd.Length, proposed.Length);
for (int i = 0; i < minLen; i++)
{
if (vd[i].leaderId > Vsync.Proposed[i].leaderId)
{
// Dueling leaders (very rare): in this case we just encountered a "better" proposal
// We'll switch to it even if it is shorter
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Dueling leaders! Found a better proposal, switching to it");
}
useProposal = true;
}
else if (vd[i].leaderId < Vsync.Proposed[i].leaderId)
{
// Dueling leaders (very rare): in this case wejust encountered an old "stale" proposal
// We'll switch to it even if it is shorter
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Dueling leaders! New proposal is inferior, ignoring it");
}
ignoreIt = true;
break;
}
if (i > scannedTo && vd[i].gaddr == ORACLE.gaddr && vd[i].joiners.Length > 0)
{
scannedTo = i;
foreach (Address a in vd[i].joiners)
{
if (!OnceAndFutureOracleMembers.ContainsKey(a))
{
OnceAndFutureOracleMembers.Add(a, false);
mustContact.Add(a);
}
}
}
}
if (!ignoreIt && (Vsync.Proposed == null || vd.Length > Vsync.Proposed.Length || useProposal))
{
Vsync.Proposed = vd;
}
}
}
internal static void SendInitialOracleLeaderInfo(Address who, Address OracleLeader)
{
if (who.isMyAddress() || Vsync.ORACLE.theView.GetRankOf(who) != -1)
{
return;
}
if ((VsyncSystem.Debug & (VsyncSystem.STARTSEQ | VsyncSystem.VIEWCHANGE | VsyncSystem.GROUPEVENTS)) != 0)
{
Vsync.WriteLine("Oracle leader sending a NULL INITIALVIEW message to a future CLIENT, my_address " + Vsync.my_address + ", new client is " + who + ", ORACLE view " + ORACLE.theView);
}
ORACLE.doP2PSend(who, true, INITIALVIEW, OracleLeader);
}
internal class GVEvent
{
internal int request;
internal Address who;
internal int mode;
internal int uid;
internal string[] gnames;
internal Address[] gaddrs;
internal long offset;
internal long[] tsigs;
internal int[] flags;
internal GVEvent(int r, Address a, int m, string[] gns, Address[] gs, long off, long[] ts, int[] fl, int u)
{
this.request = r;
this.who = a;
this.mode = m;
this.gnames = gns;
this.gaddrs = gs;
this.offset = off;
this.tsigs = ts;
this.flags = fl;
this.uid = u;
}
public override string ToString()
{
string gs = " ";
foreach (string s in this.gnames)
{
gs += s + " ";
}
return "GVE: request=" + rToString(this.request) + ", who=" + this.who + ", mode=" + this.mode + ", gnames={" + gs + "}, gaddrs=" + Address.VectorToString(this.gaddrs) + ", flags=" + this.flags + ", uid=" + this.uid;
}
}
internal static List<GVEvent> GVEList = new List<GVEvent>();
internal static LockObject GVELock = new LockObject("GVELock");
internal static List<GVEvent> AGVEList = new List<GVEvent>();
internal static LockObject AGVELock = new LockObject("AGVELock");
internal static string GetGVEState()
{
string s = "Group view events list:" + Environment.NewLine;
using (var tmpLockObj = new LockAndElevate(GVELock))
{
if (GVEList.Count > 0)
{
foreach (GVEvent gve in GVEList)
{
string gns = " ";
if (gve.gnames != null)
{
foreach (string gs in gve.gnames)
{
gns += "<" + gs + ">";
}
}
s += " Action[" + gve.uid + "]: " + rToString(gve.request) + " " + gve.who + " on groups {" + gns + "}, gaddrs {" + Address.VectorToString(gve.gaddrs) + "}" + Environment.NewLine;
}
}
}
// Omit unless debugging the GVE logic
if ((VsyncSystem.Debug & VsyncSystem.GVELOGIC) != 0)
{
using (var tmpLockObj = new LockAndElevate(AGVELock))
{
if (AGVEList.Count > 0)
{
foreach (GVEvent gve in AGVEList)
{
string gns = " ";
if (gve.gnames != null)
{
foreach (string gs in gve.gnames)
{
gns += gs;
}
}
s += " Anti-action[" + gve.uid + "]: " + gve.who + Environment.NewLine;
}
}
}
}
return s;
}
/// <exclude>
/// <summary>
/// Internal
/// </summary>
/// </exclude>
#if PROTOCOL_BUFFERS
[ProtoContract(SkipConstructor = true)]
#else
[AutoMarshalled]
#endif
public class ViewDelta : IEquatable<ViewDelta>
{
[ProtoMember(1)]
public string gname = string.Empty;
[ProtoMember(2)]
public readonly Address gaddr;
[ProtoMember(3)]
public long tsig;
[ProtoMember(4)]
public long leaderId;
[ProtoMember(5)]
public int[] mcmdmap;
[ProtoMember(6)]
public int prevVid;
[ProtoMember(7)]
public bool isLarge;
[ProtoMember(8)]
public int[] lastSeqns;
// i'th element gives the final incoming message count from the i'th member of the previous view
[ProtoMember(9)]
public Address[] joiners;
[ProtoMember(10)]
public long offset;
[ProtoMember(11)]
public Address[] leavers;
#if PROTOCOL_BUFFERS
[ProtoAfterDeserialization]
private void AfterDeserialize()
{
if (this.gname == null)
{
this.gname = string.Empty;
}
if (this.mcmdmap == null)
{
this.mcmdmap = new int[0];
}
if (this.lastSeqns == null)
{
this.lastSeqns = new int[0];
}
if (this.joiners == null)
{
this.joiners = new Address[0];
}
if (this.leavers == null)
{
this.leavers = new Address[0];
}
}
#else
public ViewDelta()
{
}
#endif
internal ViewDelta(string name, Address ga, long ts, int[] mm, int v, int nm, Address[] wantJoin, long off, Address[] wantLeave, bool lf)
{
this.leaderId = Vsync.LeaderId;
this.gname = name;
this.gaddr = ga;
this.tsig = ts;
this.mcmdmap = mm;
this.prevVid = v;
this.joiners = wantJoin;
this.offset = off;
this.leavers = wantLeave;
this.isLarge = lf;
this.lastSeqns = this.isLarge ? new int[0] : new int[nm];
// Where not large, these won't be final until the COMMIT event.
}
internal ViewDelta(string name, Address g, long off, int[] mm, int v, int[] ls, bool lf)
{
this.leaderId = Vsync.LeaderId;
this.gname = name;
this.gaddr = g;
this.mcmdmap = mm;
this.prevVid = v;
this.lastSeqns = ls;
this.isLarge = lf;
this.offset = off;
this.joiners = this.leavers = new Address[0];
}
public static bool operator ==(ViewDelta first, ViewDelta second)
{
return Equals(first, second);
}
public static bool operator !=(ViewDelta first, ViewDelta second)
{
return !Equals(first, second);
}
public static bool Equals(ViewDelta first, ViewDelta second)
{
if (object.ReferenceEquals(first, second))
{
return true;
}
if (object.ReferenceEquals(first, null) || object.ReferenceEquals(second, null))
{
return false;
}
return first.gaddr == second.gaddr && first.prevVid == second.prevVid;
}
public override bool Equals(object other)
{
return Equals(this, other as ViewDelta);
}
public bool Equals(ViewDelta other)
{
return Equals(this, other);
}
public override int GetHashCode()
{
return this.gaddr.GetHashCode() ^ this.prevVid.GetHashCode();
}
public override string ToString()
{
return " LeaderId=" + this.leaderId + ", Group <" + this.gname + "> " + this.gaddr + " (mmap " + MCMDSocket.PMCAddr(this.mcmdmap[0]) + ":" + MCMDSocket.PMCAddr(this.mcmdmap[1]) + "), isLarge=" + this.isLarge + ", prevVid " + this.prevVid + this.idsToVec() + ", Joining: {" + Address.VectorToString(Expand(this.joiners)) + "}, Leaving: {" + Address.VectorToString(Expand(this.leavers)) + "}";
}
private string idsToVec()
{
if (this.lastSeqns == null || this.lastSeqns.Length == 0)
{
return string.Empty;
}
string s = ", final msg counts: {";
foreach (int i in this.lastSeqns)
{
s = s + " " + i + " ";
}
return s + "}";
}
}
#if PROTOCOL_BUFFERS
[ProtoContract(SkipConstructor = true)]
#else
[AutoMarshalled]
#endif
public class UnstableList : IEquatable<UnstableList>
{
[ProtoMember(1)]
public readonly Address gaddr;
[ProtoMember(2)]
public Address flusher;
[ProtoMember(3)]
public readonly Address sender;
[ProtoMember(4)]
public readonly int vid;
[ProtoMember(5)]
public int mid_low;
[ProtoMember(6)]
public int mid_hi;
#if !PROTOCOL_BUFFERS
public UnstableList()
{
}
#endif
internal UnstableList(Address g, Address f, Address s, int v, int ml, int mh)
{
this.gaddr = g;
this.flusher = f;
this.sender = s;
this.vid = v;
this.mid_low = ml;
this.mid_hi = mh;
}
public static bool operator ==(UnstableList first, UnstableList second)
{
return Equals(first, second);
}
public static bool operator !=(UnstableList first, UnstableList second)
{
return !Equals(first, second);
}
public static bool Equals(UnstableList first, UnstableList second)
{
if (object.ReferenceEquals(first, second))
{
return true;
}
if (object.ReferenceEquals(first, null) || object.ReferenceEquals(second, null))
{
return false;
}
return first.gaddr == second.gaddr && first.sender == second.sender && first.vid == second.vid && first.mid_low == second.mid_low && first.mid_hi == second.mid_hi;
}
public override bool Equals(object other)
{
return Equals(this, other as UnstableList);
}
public bool Equals(UnstableList other)
{
return Equals(this, other);
}
public override int GetHashCode()
{
return this.gaddr.GetHashCode() ^ this.sender.GetHashCode() ^ this.vid.GetHashCode() ^ this.mid_low.GetHashCode() ^ this.mid_hi.GetHashCode();
}
public override string ToString()
{
return " Group " + this.gaddr + ", Flusher " + this.flusher + ": MSG[sender " + this.sender + ", ID " + this.vid + ":(" + this.mid_low + "-" + this.mid_hi + ")]";
}
}
private delegate void MergeProposals(Address[] who, Address[] hisLeader, ViewDelta[][] vds);
internal static void VUProtocol(int request, Address who, int mode, string[] gnames, Address[] gaddrs, long offset, long[] tsigs, int[] flags, int uid)
{
if (uid != -1)
{
using (var tmpLockObj = new LockAndElevate(AGVELock))
{
foreach (GVEvent agve in AGVEList)
{
if (agve.who == who && agve.uid == uid)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("Found an old ANTI-GVE entry... ignoring this VUProtocol request");
}
AGVEList.Remove(agve);
return;
}
}
}
}
if (gnames == null)
{
gnames = new string[0];
}
if (gaddrs == null)
{
gaddrs = new Address[0];
}
using (var tmpLockObj = new LockAndElevate(GVELock))
{
bool fnd = false;
GVEvent gve = new GVEvent(request, who, mode, gnames, gaddrs, offset, tsigs, flags, uid);
foreach (GVEvent oldgve in GVEList)
{
if (oldgve.who == who && oldgve.mode == mode && oldgve.gnames.Length == gnames.Length)
{
for (int i = 0; i < gnames.Length; i++)
{
if (oldgve.gnames[i].Equals(gnames[i], StringComparison.Ordinal) && oldgve.gaddrs[i] == gaddrs[i] && ((oldgve.tsigs == null || tsigs == null) ? oldgve.tsigs == tsigs : oldgve.tsigs[i] == tsigs[i]))
{
fnd = true;
break;
}
}
}
}
if (!fnd)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
string gs = " ", tss = " ", fs = " ";
foreach (string s in gnames)
{
gs += s + " ";
}
if (tsigs != null)
{
foreach (long ts in tsigs)
{
tss += ts + " ";
}
}
if (flags != null)
{
foreach (int fl in flags)
{
fs += fl + " ";
}
}
Vsync.WriteLine("Creating a GVE entry for request " + request + ", mode " + mode + ", address " + who + ", uid " + uid + ", gnames [" + gs + "], groups " + Address.VectorToString(gaddrs) + ", ts={" + tss + "}, fs={" + fs + "}");
}
GVEList.Add(gve);
}
else if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("Found an identical GVE entry... ignoring this VUProtocol request");
}
}
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("VUProtocol: BarrierRelease for LLWAIT/LGVEUPDATE");
}
ILock.Barrier(ILock.LLWAIT, ILock.LGVEUPDATE).BarrierRelease(1);
}
internal static void PurgeGVE(Address who, int uid)
{
if (uid == -1 || ORACLE.theView.IAmLeader())
{
return;
}
using (var tmpLockObj = new LockAndElevate(GVELock))
{
foreach (GVEvent gve in GVEList)
{
if (who == gve.who && uid == gve.uid)
{
GVEList.Remove(gve);
return;
}
}
}
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("... didn't find the GVE entry, creating an ANTI-GVE record");
}
GVEvent agve = new GVEvent(0, who, 0, null, null, 0L, null, null, uid);
using (var tmpLockObj = new LockAndElevate(AGVELock))
{
AGVEList.Add(agve);
}
Vsync.OnTimer(120000, () =>
{
using (var tmpLockObj = new LockAndElevate(AGVELock))
{
foreach (GVEvent gve in AGVEList)
{
if (gve == agve)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("... after 120s delay, removing an ANTI-GVE record");
}
AGVEList.Remove(agve);
return;
}
}
}
});
}
// Only runs in the leader
private static bool OracleViewTaskRunning;
private static readonly Semaphore OracleBeaconTaskWait = new Semaphore(0, int.MaxValue);
private static readonly LockObject OracleViewTaskLock = new LockObject("OracleViewTaskLock");
internal static void OracleViewTask()
{
using (var tmpLockObj = new LockAndElevate(OracleViewTaskLock))
{
if (OracleViewTaskRunning)
{
return;
}
OracleViewTaskRunning = true;
}
new Thread(() =>
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
try
{
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[4]++;
// Once every 10 seconds, announce that I am an Oracle leader
OracleBeaconTaskWait.WaitOne(10 * 1000);
ORACLE.doSendRaw(Vsync.ORACLERUNNING, Vsync.my_address);
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "OracleBeaconTask", IsBackground = true }.Start();
try
{
while (!VsyncSystem.VsyncActive || Vsync.VSYNCMEMBERS == null)
{
Vsync.Sleep(250);
}
List<GVEvent> gveList = new List<GVEvent>();
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[5]++;
GVEvent gve;
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("OracleViewTask: Before BarrierWait for LLWAIT/LGVEUPDATE");
}
ILock.Barrier(ILock.LLWAIT, ILock.LGVEUPDATE).BarrierWait();
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("OracleViewTask: After BarrierWait for LLWAIT/LGVEUPDATE");
}
using (var tmpLockObj = new LockAndElevate(GVELock))
{
if (GVEList.Count == 0)
{
continue;
}
gve = GVEList.First();
GVEList.Remove(gve);
gveList.Add(gve);
bool alreadyExists = true;
foreach (Address gaddr in gve.gaddrs)
{
Group tpg = Group.TrackingProxyLookup(gaddr);
if (tpg == null || !tpg.HasFirstView)
{
alreadyExists = false;
}
}
if (alreadyExists)
{
foreach (GVEvent gve2 in GVEList)
{
if (Address.SameAddrs(gve.gaddrs, gve2.gaddrs) && Address.SameNames(gve.gnames, gve2.gnames))
{
gveList.Add(gve2);
}
}
foreach (GVEvent gve2 in gveList)
{
if (gve2 != gve)
{
GVEList.Remove(gve2);
}
}
}
}
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
string gns = " ";
if (gve.gnames != null)
{
foreach (string gs in gve.gnames)
{
gns += gs + " ";
}
}
Vsync.WriteLine("Request GVUpdate: gve=(mode:" + gve.mode + ", request:" + rToString(gve.request) + ", who: " + gve.who + ", uid:" + gve.uid + ", gnames:{" + gns + "}, gaddrs:[" + Address.VectorToString(gve.gaddrs) + "]");
}
if (!ORACLE.TakingOver || (gve.request == LEAVE && ORACLE.theView.GetRawRankOf(gve.who) == ORACLE.theView.GetRawRankOf(Vsync.my_address) - 1))
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("Calling RequestGVUpdates, gve list contains " + gveList.Count + " gve items");
}
RequestGVUpdates(gveList);
gveList = new List<GVEvent>();
ORACLE.TakingOver = false;
}
bool sendIt = false;
using (var tmpLockObj = new LockAndElevate(GVELock))
{
if (GVEList.Count == 0 && Vsync.VSYNCMEMBERS.HasFirstView)
{
sendIt = true;
}
}
if (sendIt)
{
// Although the system also has a way to do this on a per-group basis, the volume of P2P traffic it caused was excessive
Group.IPMCViewCast(Vsync.VSYNCMEMBERS.theView.viewid, Vsync.VSYNCMEMBERS.gaddr, Vsync.my_address, Vsync.VSYNCMEMBERS.theView);
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
finally
{
OracleBeaconTaskWait.Release();
}
VsyncSystem.ThreadTerminationMagic();
}
// Core of the virtual synchrony implementation, runs only in the leader. The list will only contain one element unless
// there is a new leader taking over in the ORACLE and it discovers one or more proposed events that it needs to order ahead
// of the ORACLE leader FAIL proposal
internal static void RequestGVUpdates(List<GVEvent> gveList)
{
List<Group> AggLGWithNewOwner = new List<Group>();
// Large groups that will have a new group owner as a result of this GVEUpdate
List<Group> AggCreateList = new List<Group>();
List<Group> AggProposeGlist = new List<Group>();
// Don't participate in the 2PC (currently: regular groups being created by this action)
List<Group> AggCommitGlist = new List<Group>();
// Vsync groups, they participate in the 2PC, as does the ORACLE
List<Group> AggLargeGlist = new List<Group>();
// Large groups. The ORACLE runs the 2PC and then unilaterally tells them to commit the new view
List<ViewDelta> vdlist = new List<ViewDelta>();
foreach (GVEvent gve in gveList)
{
if (gve.request == JOIN)
{
for (int idx = 0; idx < gve.gnames.Length; idx++)
{
string gn = gve.gnames[idx];
if (!gn.Equals("ORACLE", StringComparison.Ordinal) && !gn.Equals("VSYNCMEMBERS", StringComparison.Ordinal))
{
Group tpg = Group.TrackingProxyLookup(gn);
if (tpg != null && tpg.TypeSig != 0 && tpg.TypeSig != gve.tsigs[idx])
{
ReliableSender.SendPoison(gve.who, "TypeSignature mismatch in group <" + gn + ">");
Vsync.Sleep(50);
Vsync.NodeHasFailed(gve.who, "TypeSignature mismatch", false);
return;
}
}
}
}
}
foreach (GVEvent gve in gveList)
{
List<Group> LGWithNewOwner = new List<Group>();
// Large groups that will have a new group owner as a result of this GVEUpdate
List<Group> CreateList = new List<Group>();
List<Group> ProposeGlist = new List<Group>();
// Don't participate in the 2PC (currently: regular groups being created by this action)
List<Group> CommitGlist = new List<Group>();
// Vsync groups, they participate in the 2PC, as does the ORACLE
List<Group> LargeGlist = new List<Group>();
// Large groups. The ORACLE runs the 2PC and then unilaterally tells them to commit the new view
if (gve.request == FDETECTION || gve.request == LEAVE)
{
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("VUProtocol: FDETECTION/LEAVE event");
}
if (ORACLE.theView.GetRawRankOf(gve.who) != -1)
{
CommitGlist.Add(ORACLE);
ProposeGlist.Add(ORACLE);
}
using (var tmpLockObj = new LockAndElevate(Group.TPGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in Group.TPGroups)
{
Group g = kvp.Value;
int r;
if ((gve.request == LEAVE && !gve.gnames.Contains(g.gname)) || (r = g.theView.GetRawRankOf(gve.who)) == -1)
{
continue;
}
if ((g.flags & Group.G_ISLARGE) != 0)
{
LargeGlist.Add(g);
if (r == 0)
{
LGWithNewOwner.Add(g);
}
}
else if (!CommitGlist.Contains(g))
{
CommitGlist.Add(g);
if ((g.flags & Group.G_ISRAW) == 0)
{
ProposeGlist.Add(g);
}
}
}
}
}
else
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
string gs = " ", tss = " ", fs = " ";
foreach (string s in gve.gnames)
{
gs += s + " ";
}
if (gve.tsigs != null)
{
foreach (long ts in gve.tsigs)
{
tss += ts + " ";
}
}
if (gve.flags != null)
{
foreach (int fl in gve.flags)
{
fs += fl + " ";
}
}
Vsync.WriteLine("VUProtocol: executing GVE event[" + gve.uid + "]: request " + Vsync.rToString(gve.request) + " mode " + gve.mode + " address " + gve.who + " gnames[" + gs + "] groups " + Address.VectorToString(gve.gaddrs) + "m ts={" + tss + "}, fs={" + fs + "}");
}
for (int i = 0; i < gve.gnames.Length; i++)
{
Group g;
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("VUProtocol: group name is " + gve.gnames[i]);
}
if (gve.gnames[i].Equals("ORACLE", StringComparison.Ordinal))
{
g = ORACLE;
}
else
{
if ((g = Group.TrackingProxyLookup(gve.gaddrs[i])) == null || g.theView == null || g.theView.members.Length == 0)
{
if ((gve.mode & Group.CREATE) == 0)
{
// Group.JOIN only but doesn't exist
ORACLE.doP2PSend(gve.who, true, JOINFAILED, gve.gaddrs[i], "JoinExisting but group <" + gve.gnames[i] + "> didn't exist");
return;
}
// Create tracking proxy for groups if needed
g = Group.TrackingProxy(gve.gnames[i], "VUP", gve.gaddrs[i], gve.tsigs[i], null, new View(gve.gnames[i], gve.gaddrs[i], new[] { gve.who }, -1, false), gve.flags[i], false);
CreateList.Add(g);
continue;
}
if ((gve.mode & Group.JOIN) == 0)
{
// Group.JOIN only but doesn't exist
ORACLE.doP2PSend(gve.who, true, JOINFAILED, gve.gaddrs[i], "Create but group <" + gve.gnames[i] + "> already exists");
return;
}
}
if ((gve.flags[i] & Group.G_ISLARGE) != 0)
{
LargeGlist.Add(g);
}
else
{
CommitGlist.Add(g);
if (g.theView != null && g.theView.members.Length > 0 && (g.flags & Group.G_ISRAW) == 0)
{
ProposeGlist.Add(g);
}
}
}
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
foreach (Group g in CreateList)
{
Vsync.WriteLine("+++[+] Create GroupList contains <" + g.gname + ">");
}
foreach (Group g in ProposeGlist)
{
Vsync.WriteLine("+++[P] Proposed GroupList contains <" + g.gname + ">");
}
foreach (Group g in ProposeGlist)
{
Vsync.WriteLine("+++[C] Commit GroupList contains <" + g.gname + ">");
}
foreach (Group g in LargeGlist)
{
Vsync.WriteLine("+++[L] Large GroupList contains <" + g.gname + ">");
}
}
int nProposed = Vsync.Proposed == null ? 0 : Vsync.Proposed.Length;
Address[] wantJoin = null;
Address[] wantLeave = null;
if (gve.request == JOIN)
{
bool fnd = false;
foreach (ViewDelta vd in vdlist)
{
if (vd.leavers.Length == 0 && vd.leaderId == LeaderId)
{
foreach (Address ga in gve.gaddrs)
{
if (ga == vd.gaddr)
{
fnd = true;
Vsync.ArrayResize(ref vd.joiners, vd.joiners.Length + 1);
vd.joiners[vd.joiners.Length - 1] = gve.who;
}
}
}
}
if (!fnd)
{
foreach (ViewDelta vd in vdlist)
{
if (gve.gaddrs.Contains(vd.gaddr) && vd.joiners.Contains(gve.who))
{
fnd = true;
break;
}
}
if (!fnd)
{
wantJoin = new[] { gve.who };
wantLeave = new Address[0];
}
}
}
else
{
bool fnd = false;
foreach (ViewDelta vd in vdlist)
{
if (vd.joiners.Length == 0 && vd.leaderId == LeaderId)
{
foreach (Address ga in gve.gaddrs)
{
if (ga == vd.gaddr)
{
fnd = true;
Vsync.ArrayResize(ref vd.leavers, vd.leavers.Length + 1);
vd.leavers[vd.leavers.Length - 1] = gve.who;
}
}
}
}
if (!fnd)
{
foreach (ViewDelta vd in vdlist)
{
if (gve.gaddrs.Contains(vd.gaddr) && vd.leavers.Contains(gve.who))
{
fnd = true;
break;
}
}
if (!fnd)
{
wantLeave = new[] { gve.who };
wantJoin = new Address[0];
}
}
}
// If ORACLE group has a failure that will take it below VSYNC_ORACLESIZE, see if we have any available candidates
// that could replace the departing members(s) and if so, add them to a list of proposed joiners.
if (gve.gaddrs.Length == 1 && gve.gaddrs[0] == ORACLE.gaddr && wantLeave != null && wantLeave.Length > 0)
{
using (var tmpLockObj = new LockAndElevate(CanBeOracleListLock))
{
while (CanBeOracleList.Count > 0)
{
Address a = CanBeOracleList.First();
CanBeOracleList.Remove(a);
using (var tmpLockObj1 = new LockAndElevate(RIPLock))
{
if (ORACLE.GetRankOf(a) != -1 || RIPList.Contains(a))
{
continue;
}
}
wantJoin = new[] { a };
break;
}
}
}
if (nProposed > 0)
{
// Pending stuff I know (or learned) about
foreach (ViewDelta vd in Vsync.Proposed)
{
vdlist.Add(vd);
}
}
Vsync.Proposed = null;
if (wantJoin != null && wantLeave != null)
{
foreach (Group g in CommitGlist)
{
AddVD(vdlist, wantJoin, wantLeave, g, gve.offset);
}
foreach (Group g in LargeGlist)
{
AddVD(vdlist, wantJoin, wantLeave, g, gve.offset);
}
foreach (Group g in CreateList)
{
AddVD(vdlist, wantJoin, wantLeave, g, gve.offset);
}
}
// Now sweep everything into the corresponding aggregated lists and repeat if there are more GVE entries, which happens only if
// a new leader is taking over and needs to include one or more pending proposals with the LEAVE for the old ORACLE leader
AddUnique(AggLGWithNewOwner, LGWithNewOwner);
AddUnique(AggCreateList, CreateList);
AddUnique(AggProposeGlist, ProposeGlist);
AddUnique(AggCommitGlist, CommitGlist);
AddUnique(AggLargeGlist, LargeGlist);
}
if (!AggCommitGlist.Contains(ORACLE))
{
AggProposeGlist.Add(ORACLE);
AggCommitGlist.Add(ORACLE);
}
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("VUProtocol: Lists [ " + AggCommitGlist.Count + " commit, " + AggProposeGlist.Count + " propose, " + AggLargeGlist.Count + " Large(SameOwner), " + AggLGWithNewOwner.Count + " Large(NewOwner) " + AggCreateList.Count + " create]");
}
ViewDelta[] vds = vdlist.ToArray();
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("VUProtocol: About to Propose View Deltas:");
foreach (ViewDelta vd in vds)
{
Vsync.WriteLine(" " + vd);
}
}
////List<Address> whoFailed = new List<Address>();
// For each group in glist, we'll get a reply from each member, in the form of a byte-vector
List<byte[]>[] ba = null;
// This will initialize whoFailed with the list of processes shown as failed
// for groups in glist. No point in expecting them to reply to the PROPOSE solicitation
bool must_loop = true;
UnstableList[] usl = new UnstableList[0];
int oldLen = 0;
int nreplies = 0;
int loopLimit = CountLive(AggProposeGlist);
int nExpected = loopLimit++;
while (must_loop)
{
if (loopLimit-- < 0)
{
string gstr = " ", vdStr = string.Empty, rstr = " ";
foreach (Group g in AggProposeGlist)
{
gstr += g.gname + " ";
}
foreach (ViewDelta vd in vds)
{
vdStr += ">> " + vd + Environment.NewLine;
}
for (int gn = 0; gn < ba.Length; gn++)
{
rstr += ba[gn].Count + " ";
}
throw new VsyncException("Trapped looping in PROPOSE to <" + gstr + ">! (nreplies " + nreplies + ", nExpected " + nExpected + " (" + rstr + "), usl.Length " + usl.Length + ", oldLen " + oldLen + ")" + Environment.NewLine + "ViewDeltas:" + Environment.NewLine + vdStr + VsyncSystem.GetState());
}
must_loop = false;
oldLen = usl.Length;
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
string dests = " ";
foreach (Group g in AggProposeGlist)
{
dests += g.gname + " ";
}
Vsync.WriteLine("Sending the PROPOSE messages to [" + dests + "]");
}
// Flush can take a long time, so disable the timeout
ba = Group.doMultiQuery(AggProposeGlist, Group.ALL, true, new Timeout(int.MaxValue, Timeout.TO_FAILURE, "PROPOSE"), PROPOSE, vds, usl);
nreplies = 0;
for (int gn = 0; gn < ba.Length; gn++)
{
nreplies += ba[gn].Count;
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("After multiquery tabulate results from " + ba.Length + " groups that were queried (total of " + nreplies + " distinct replies");
}
int threshold = 0;
if (!VSYNC_IGNOREPARTITIONS && (ORACLE.theView.members.Length > 2 || !VSYNC_IGNORESMALLPARTITIONS))
{
threshold = ((ORACLE.theView.members.Length / 2) + 1);
}
if (Tabulate(AggProposeGlist, ba, vds, ref usl) < threshold)
{
if (loopLimit > 0)
{
List<Address> ag = new List<Address>();
foreach (Group g in AggProposeGlist)
{
ag.Add(g.gaddr);
}
Vsync.WriteLine("After doMultiQuery to " + Address.VectorToString(ag.ToArray()) + " got fewer than " + ((ORACLE.theView.members.Length / 2) + 1) + " replies!!!");
Vsync.WriteLine("WARNING: ORACLE got " + nreplies + " replies, but was expecting at least " + ((ORACLE.theView.members.Length / 2) + 1) + "... retry proposal");
must_loop = true;
continue;
}
else
{
List<Address> ag = new List<Address>();
foreach (Group g in AggProposeGlist)
{
ag.Add(g.gaddr);
}
Vsync.WriteLine("After doMultiQuery to " + Address.VectorToString(ag.ToArray()) + " got fewer than " + ((ORACLE.theView.members.Length / 2) + 1) + " replies!!!");
throw new VsyncException("VSYNC experienced a loss of majority (expected " + nExpected + "), terminating.");
}
}
if (nreplies != nExpected || usl.Length > oldLen)
{
int newExpected = CountLive(AggProposeGlist);
if (newExpected == nExpected)
{
break;
}
must_loop = true;
nExpected = newExpected;
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("After tabulate must_loop " + must_loop + "(nreplies " + nreplies + ", nExpected " + nExpected + "; usl.Length " + usl.Length + ", oldLen " + oldLen + ")");
}
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.Write("Sending the COMMIT messages to ");
foreach (Group g in AggCommitGlist)
{
Vsync.Write("<" + g.gname + ">");
}
Vsync.WriteLine();
}
Address[] whos = new Address[gveList.Count];
int[] uids = new int[gveList.Count];
int n = 0;
foreach (GVEvent gve in gveList)
{
whos[n] = gve.who;
uids[n] = gve.uid;
++n;
}
foreach (ViewDelta vd in vds)
{
foreach (Address who in Expand(vd.leavers))
{
Group g = Group.TrackingProxyLookup(vd.gaddr);
if (g != null && g.theView != null)
{
ReliableSender.SendP2P(Msg.ISGRPP2P, who, g, g.theView.viewid, ReliableSender.P2PSequencer.NextP2PSeqn("commit", who), Msg.toBArray(COMMIT, vds, whos, uids), true, null, null);
}
}
}
// Now perform the COMMIT operations, first for groups other than the ORACLE, and then for the ORACLE itself
List<Address> vdsg = new List<Address>();
foreach (ViewDelta vd in vds)
{
if (!vdsg.Contains(vd.gaddr) && Vsync.ORACLE.gaddr != vd.gaddr)
{
vdsg.Add(vd.gaddr);
}
}
if (vdsg.Count > 0)
{
Group.doMultiSend(AggCommitGlist.Where(g => vdsg.Contains(g.gaddr)).ToList(), true, COMMIT, vds, whos, uids);
foreach (Group g in AggCommitGlist)
{
if (g != Vsync.ORACLE)
{
g.Flush(Vsync.VSYNC_ACKTHRESHOLD);
}
}
}
Vsync.ORACLE.doSend(true, false, COMMIT, vds, whos, uids);
if (AggLargeGlist.Count > 0)
{
foreach (Group g in AggLargeGlist)
{
sendVDS(g, vds, AggLGWithNewOwner.Contains(g));
}
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Calling CommitWait");
}
ILock.Barrier(ILock.LLWAIT, ILock.LCOMMIT).BarrierWait();
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("After CommitWait");
}
}
// This is actually called from the ORACLE commit logic and is called in each ORACLE member. As a result we redundantly send
// INITIALVIEW messages to the group members, but the advantage is that the scheme is tolerant of a crash of the ORACLE leader
// during a group join. Obviously one could clean this up using logic similar to the DoAsLeader scheme, but because the INITIALVIEW
// is being sent to a process not currently in the group, there is no simple way to get around asking that joining process if it
// has the INITIALVIEW yet or not. That gets a bit tricky (given the asynchrony of the failure notification that breaks the connection
// from the old leader to the joining member). So we're paying a price in extra messages, which get ignored, and the benefit is
// simpler code (assuming you call this simpler... you may disagree!)
internal static void SendInitialView(ViewDelta[] vds)
{
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("SendInitialView.... view deltas:");
foreach (ViewDelta vd in vds)
{
Vsync.WriteLine("++++++ " + vd);
}
}
foreach (ViewDelta vd in vds)
{
Address[] joiners = Expand(vd.joiners);
if (vd.gaddr == ORACLE.gaddr)
{
foreach (Address who in joiners)
{
SendOracleInitialView(vd, who);
}
}
else
{
Group g = Group.TrackingProxyLookup(vd.gaddr);
View theView;
if (g == null)
{
throw new VsyncException("Can't find the group in VUCommit");
}
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
theView = g.theView;
}
if (theView == null)
{
throw new VsyncException("View is null for " + g.gname + " in VUCommit: ViewDelta=" + vd + VsyncSystem.GetState());
}
if (!Vsync.VSYNC_UNICAST_ONLY && Vsync.VSYNCMEMBERS != null && theView.members.Length > Vsync.VSYNC_ORACLESIZE && !joiners.Contains(Vsync.my_address) &&
g != Vsync.VSYNCMEMBERS && joiners.Length > Vsync.VSYNC_INITVIAOOB / 10 || theView.members.Length > Vsync.VSYNC_INITVIAOOB)
{
SendViewViaOOB(g, theView, joiners, vds, vd);
}
else
{
foreach (Address who in joiners)
{
SendNonOracleInitialView(vds, vd, who);
}
}
}
}
}
private static void SendOracleInitialView(ViewDelta vd, Address newOracleMember)
{
string[] names;
Address[] gaddrs;
long[] tsigs;
View[] vs;
bool[] isl;
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("+++ newOracleMember true.");
}
int n;
using (var tmpLockObj = new LockAndElevate(Group.TPGroupsLock))
{
int ng = Group.TPGroups.Count + 1;
foreach (KeyValuePair<Address, Group> kvp in Group.TPGroups)
{
if (!kvp.Value.HasFirstView)
{
--ng;
}
}
names = new string[ng];
gaddrs = new Address[ng];
tsigs = new long[ng];
vs = new View[ng];
isl = new bool[ng];
n = 0;
foreach (KeyValuePair<Address, Group> kvp in Group.TPGroups)
{
if (kvp.Value.HasFirstView)
{
includeGroup(n++, names, gaddrs, tsigs, vs, isl, kvp.Value);
}
}
}
includeGroup(n, names, gaddrs, tsigs, vs, isl, Vsync.ORACLE);
MCMDSocket.SetMap("ORACLE:SendOracleInitialView", "ORACLE", false, MCMDSocket.GetMap(Vsync.ORACLE.gaddr, true));
// This sends the list of groups and tracking proxies to a new oracle member
// It also includes the view of the oracle itself
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC | VsyncSystem.STARTSEQ)) != 0)
{
Vsync.WriteLine("ORACLE.P2PSend INITALVIEW for " + Address.VectorToString(gaddrs) + " from " + Vsync.my_address + " to " + newOracleMember + " MCMD STATE + " + MCMDSocket.GetState());
}
int[,] mms = MCMDSocket.GetMap(gaddrs);
Address[] CBMs;
using (var tmpLockObj = new LockAndElevate(Vsync.CanBeOracleListLock))
{
CBMs = Vsync.CanBeOracleList.ToArray();
}
ORACLE.doP2PSend(newOracleMember, true, INITIALVIEW, Vsync.my_address, names, gaddrs, tsigs, vs, isl, MCMDSocket.nextPhysIPAddr, mms, CBMs);
}
private static void SendNonOracleInitialView(ViewDelta[] vds, ViewDelta vd, Address newGroupMember)
{
Group g = Group.TrackingProxyLookup(vd.gaddr);
if (g == null)
{
throw new VsyncException("Can't find the group in VUCommit");
}
if (g.theView == null)
{
throw new VsyncException("View is null for " + g.gname + " in VUCommit: ViewDelta=" + vd + VsyncSystem.GetState());
}
if ((g.flags & Group.G_ISLARGE) == 0 || vd.prevVid == -1)
{
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Group <" + g.gname + ">, sending INITIALVIEW to " + newGroupMember + ", VIEW = " + g.theView);
}
ReliableSender.SendP2P(Msg.ISGRPP2P, newGroupMember, g, g.theView.viewid, ReliableSender.P2PSequencer.NextP2PSeqn("intialview", newGroupMember), Msg.toBArray(INITIALVIEW, Vsync.my_address, g.theView, MCMDSocket.GetMap(g.gaddr, true), 0), true, null, null);
}
if ((g.flags & Group.G_ISLARGE) != 0)
{
Vsync.sendVDS(g, vds, vd.joiners.Length > 0);
}
}
private static void SendViewViaOOB(Group g, View theView, Address[] joiners, ViewDelta[] vds, ViewDelta vd)
{
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Group <" + g.gname + ">, sending INITIALVIEW via OOB to " + Address.VectorToString(joiners) + ", VIEW = " + g.theView);
}
byte[] ba = theView.toBArray();
string fname = g.gname;
int lio = fname.LastIndexOfAny(new[] { '\\', '/' });
if (lio != -1)
{
fname = fname.Substring(lio + 1);
}
fname += "-view" + theView.viewid + ".v" + Vsync.ORACLE.GetMyRank();
#if __MonoCS__
fname = "/tmp/" + fname;
#endif
bool redo = true;
int retries = 0;
MemoryMappedFile mmf = null;
do
{
redo = false;
mmf = null;
try
{
try
{
try
{
File.Delete(fname);
}
catch (IOException)
{
}
mmf = Group.CreateNew(fname, ba.Length);
MemoryMappedViewAccessor mva = mmf.CreateViewAccessor();
mva.WriteArray(0, ba, 0, ba.Length);
}
catch (IOException)
{
if (mmf != null)
{
mmf.Dispose();
}
mmf = MemoryMappedFile.OpenExisting(fname);
}
}
catch (FileNotFoundException)
{
if (++retries == 5)
throw new VsyncException("Tried and tried but SendViewViaOOB was unable to map " + fname);
redo = true;
}
}
while (redo);
Vsync.VSYNCMEMBERS.OOBRegister(true, fname, Vsync.VSYNCMEMBERS.gaddr, mmf, Vsync.ORACLE.theView.members.ToList());
List<Address> replicas = new List<Address> { Vsync.my_address };
foreach (Address j in joiners)
{
replicas.Add(j);
}
Vsync.VSYNCMEMBERS.OOBReReplicate(true, new List<Group.OOBRepInfo> { new Group.OOBRepInfo(fname, Vsync.VSYNCMEMBERS.gaddr, ba.Length, Group.OwnersToIPAddrs(replicas)) },
(oobfname, mf) =>
{
Thread.Sleep(Vsync.VSYNC_DEFAULTTIMEOUT * 3 / 2); g.OOBDelete(oobfname);
});
foreach (Address newGroupMember in joiners)
{
ReliableSender.SendP2P(Msg.ISGRPP2P, newGroupMember, g, theView.viewid, ReliableSender.P2PSequencer.NextP2PSeqn("intialview", newGroupMember),
Msg.toBArray(INITIALVIEW, Vsync.my_address, fname, MCMDSocket.GetMap(g.gaddr, true), 0), true, null, null);
}
if ((g.flags & Group.G_ISLARGE) != 0)
{
Vsync.sendVDS(g, vds, vd.joiners.Length > 0);
}
}
private static void AddUnique(List<Group> AggregatedList, List<Group> GroupsToAdd)
{
foreach (Group g in GroupsToAdd)
{
if (!AggregatedList.Contains(g))
{
AggregatedList.Add(g);
}
}
}
private static void AddVD(List<ViewDelta> vdlist, Address[] wantJoin, Address[] wantLeave, Group g, long offset)
{
View theView;
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
theView = g.theView;
}
List<Address> prevMembers = theView == null ? new List<Address>() : theView.members.ToList();
int prevVid = -1;
foreach (ViewDelta vd in vdlist)
{
foreach (Address a in Expand(vd.leavers))
{
prevMembers.Remove(a);
}
foreach (Address a in Expand(vd.joiners))
{
prevMembers.Add(a);
}
if (prevMembers.Count == 0)
{
prevVid = -1;
}
else if (vd.gaddr == g.gaddr && vd.prevVid >= prevVid)
{
prevVid = vd.prevVid + 1;
}
}
if (prevVid == -1)
{
prevVid = g.theView.viewid;
}
vdlist.Add(new ViewDelta(g.gname, g.gaddr, g.TypeSig, g.MCMDMAP(), prevVid, g.theView.members.Length, wantJoin, offset, wantLeave, (g.flags & Group.G_ISLARGE) != 0));
}
/*
* Large group: Send the owner (only) the view deltas for that group
* If the new owner isn't the same as the previous owner, however, also send
* previous view deltas that he might have missed (e.g. not yet known to be stable)
*/
private static readonly LockObject oldVDSlock = new LockObject("oldVDSLock");
private static ViewDelta[] oldVDS;
private static long[] vdTimes;
internal static void clearOldVDS(Address gaddr, int StableTo)
{
using (var tmpLockObj = new LockAndElevate(oldVDSlock))
{
// First count the ones to keep
int cnt = 0;
if (oldVDS != null)
{
for (int i = 0; i < oldVDS.Length; i++)
{
if (oldVDS[i].gaddr == gaddr && oldVDS[i].prevVid > StableTo)
{
++cnt;
}
}
}
ViewDelta[] newVDS = new ViewDelta[cnt];
if (oldVDS != null)
{
cnt = 0;
foreach (ViewDelta oldVD in oldVDS)
{
if (oldVD.gaddr == gaddr && oldVD.prevVid > StableTo)
{
newVDS[cnt++] = oldVD;
}
}
}
oldVDS = newVDS;
}
}
private static void sendVDS(Group g, ViewDelta[] vds, bool includePrev)
{
ViewDelta[] vdsToApply;
int cnt = 0;
using (var tmpLockObj = new LockAndElevate(oldVDSlock))
{
// Count the number of ViewDeltas to send. First any lingering old ones
if (includePrev && oldVDS != null)
{
cnt = oldVDS.Length;
}
foreach (ViewDelta vd in vds)
{
if (vd.gaddr == g.gaddr)
{
++cnt;
}
}
if (cnt == 0)
{
return;
}
vdsToApply = new ViewDelta[cnt];
long[] times = new long[cnt];
cnt = 0;
if (includePrev && oldVDS != null)
{
for (int i = 0; i < oldVDS.Length; i++)
{
if ((Vsync.NOW - vdTimes[i]) < 180000L && !vdDup(vdsToApply, cnt, oldVDS[i]))
{
// 3 minutes
times[cnt] = vdTimes[cnt];
vdsToApply[cnt++] = oldVDS[i];
}
}
}
foreach (ViewDelta vd in vds)
{
if (vd.gaddr == g.gaddr && !vdDup(vdsToApply, cnt, vd))
{
times[cnt] = Vsync.NOW;
vdsToApply[cnt++] = vd;
}
}
if (cnt < vdsToApply.Length)
{
Vsync.ArrayResize(ref vdsToApply, cnt);
}
oldVDS = vdsToApply;
vdTimes = times;
}
g.gotNewViewDeltas(vdsToApply);
int ownerRank = 0;
while (ownerRank < g.theView.members.Length && (g.theView.hasFailed[ownerRank] || isLeaving(g.theView.members[ownerRank], vdsToApply)))
{
++ownerRank;
}
if (ownerRank == g.theView.members.Length)
{
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("SendVDS discovered that a large group experienced a total failure; terminating it");
}
g.Terminate();
return;
}
Address sendTo = g.theView.members[ownerRank];
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("Sending LgCOMMIT via RELAYSEND to LgOwner " + g.theView.members[ownerRank] + " for view deltas:");
foreach (ViewDelta vd in vdsToApply)
{
Vsync.WriteLine(" " + vd);
}
}
g.doP2PSend(sendTo, true, RELAYSEND, vdsToApply);
}
private static bool vdDup(ViewDelta[] vds, int cnt, ViewDelta nvd)
{
if (nvd == null)
{
return true;
}
for (int n = 0; n < cnt; n++)
{
ViewDelta vd = vds[n];
if (vd == null)
{
break;
}
if (vd.gaddr == nvd.gaddr && vd.leaderId == nvd.leaderId && vd.prevVid == nvd.prevVid)
{
return true;
}
}
return false;
}
private static bool isLeaving(Address who, ViewDelta[] vds)
{
foreach (ViewDelta vd in vds)
{
foreach (Address l in vd.leavers)
{
if (l == who)
{
return true;
}
}
}
return false;
}
private static void includeGroup(int n, string[] names, Address[] gaddrs, long[] tsigs, View[] vs, bool[] isl, Group g)
{
names[n] = g.gname;
gaddrs[n] = g.gaddr;
tsigs[n] = g.TypeSig;
vs[n] = g.theView;
isl[n] = (g.flags & Group.G_ISLARGE) != 0;
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("Including initial view info: <" + g.gname + ">, <" + g.gaddr + ">, view " + g.theView);
}
}
private static int CountLive(List<Group> glist)
{
int nLive = 0;
foreach (Group g in glist)
{
if (g.theView == null)
{
throw new VsyncException("theView null for group " + g.gname + " in CountLive");
}
for (int m = 0; m < g.theView.members.Length; m++)
{
if (!g.theView.hasFailed[m])
{
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
if (!Vsync.RIPList.Contains(g.theView.members[m]))
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("PROPOSE/CountLive: <" + g.gname + "> expecting a reply from " + g.theView.members[m]);
}
++nLive;
}
else if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("PROPOSE/CountLive: <" + g.gname + "> NOT expecting a reply from " + g.theView.members[m] + " (shown as dead in this view)");
}
}
}
}
}
return nLive;
}
internal static int CountLive(List<Address> alist)
{
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
return alist.Count(a => !Vsync.RIPList.Contains(a));
}
// Updates vds to list terminal message send counts, computes the flush vector,
// and then returns the number of Oracle members who replied to the proposal
private static int Tabulate(List<Group> glist, List<byte[]>[] bas, ViewDelta[] vds, ref UnstableList[] usl)
{
int gn;
int ngroups = glist.Count;
List<Address> GotResponseFrom = new List<Address>();
Group[] garray = glist.ToArray();
if (bas.Length != ngroups)
{
throw new VsyncException("Inconsistency in Tabulate: glist.Count was " + ngroups + ", but reply byte array contained " + bas.Length + " reply vectors!");
}
UnstableList[][] UnstableMsgs = new UnstableList[bas.Length][];
for (gn = 0; gn < ngroups; gn++)
{
Group g = garray[gn];
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
UnstableMsgs[gn] = new UnstableList[g.theView.members.Length];
for (int who = 0; who < g.theView.members.Length; who++)
{
UnstableMsgs[gn][who] = new UnstableList(g.gaddr, NULLADDRESS, g.theView.members[who], g.theView.viewid, -1, -1);
}
}
}
int nOracleReplies = 0;
foreach (List<byte[]> baGroup in bas)
{
for (int outerWho = 0; outerWho < baGroup.Count; outerWho++)
{
int who = outerWho;
byte[] ba = baGroup[who];
object[] obs = Msg.BArrayToObjects(ba);
// Msg.InvokeFromBArray(ba, (tabulator)((sender, rvds) =>
if (obs.Length == 2 && obs[0].GetType() == typeof(Address) && obs[1].GetType() == typeof(ViewDelta[]))
{
Address sender = (Address)obs[0];
GotResponseFrom.Add(sender);
ViewDelta[] rvds = (ViewDelta[])obs[1];
if (rvds.Length == 0)
{
++nOracleReplies;
}
else
{
bool oFound = false;
for (int n = 0; n < vds.Length; n++)
{
ViewDelta rvd = rvds[n];
Group g;
if (rvd.gaddr == ORACLE.gaddr)
{
g = ORACLE;
}
else if ((g = Group.TrackingProxyLookup(rvd.gaddr)) == null)
{
// Very rarely, I've seen this happen and am unclear how the race arises...
Vsync.Sleep(2500);
if ((g = Group.TrackingProxyLookup(rvd.gaddr)) == null)
{
return 0;
}
}
if (!oFound && g == ORACLE)
{
oFound = true;
++nOracleReplies;
}
if (rvd.prevVid == vds[n].prevVid)
{
gn = -1;
if (rvd.lastSeqns.Length != vds[n].lastSeqns.Length)
{
Vsync.WriteLine("WARNING: In Vsync/Tabulate, rvd.lastSeqn length was " + rvd.lastSeqns.Length + ", should have been equal to vds[n].lastSeqns length " + vds[n].lastSeqns.Length);
return 0;
}
for (int i = 0; i < vds[n].lastSeqns.Length; i++)
{
if (rvd.lastSeqns.Length > 0 && vds[n].lastSeqns[i] != rvd.lastSeqns[i])
{
if (gn == -1)
{
for (gn = 0; gn < ngroups; gn++)
{
if (glist[gn].gaddr == rvd.gaddr)
{
break;
}
}
if (gn == ngroups)
{
throw new VsyncException("Inconsistency in Tabulate: glist didn't contain group " + rvd.gaddr);
}
}
if (vds[n].lastSeqns[i] < rvd.lastSeqns[i])
{
UnstableMsgs[gn][i].flusher = sender;
if (UnstableMsgs[gn][i].mid_low == -1)
{
UnstableMsgs[gn][i].mid_low = rvd.lastSeqns[i];
}
else if (UnstableMsgs[gn][i].mid_hi != -1)
{
UnstableMsgs[gn][i].mid_low = Math.Min(UnstableMsgs[gn][i].mid_low, UnstableMsgs[gn][i].mid_hi);
}
UnstableMsgs[gn][i].mid_hi = rvd.lastSeqns[i];
vds[n].lastSeqns[i] = rvd.lastSeqns[i];
}
else
{
UnstableMsgs[gn][i].mid_low = rvd.lastSeqns[i];
}
}
}
}
}
}
}
else
{
throw new VsyncException("in Tabulate expected reply signature { Vsync.Address Vsync.Vsync+ViewDelta[] }");
}
}
}
int nUnstable = 0;
for (gn = 0; gn < ngroups; gn++)
{
for (int who = 0; who < UnstableMsgs[gn].Length; who++)
{
if (!GotResponseFrom.Contains(UnstableMsgs[gn][who].sender) && UnstableMsgs[gn][who].mid_hi != -1 && UnstableMsgs[gn][who].mid_low != -1)
{
nUnstable++;
}
}
}
usl = new UnstableList[nUnstable];
nUnstable = 0;
for (gn = 0; gn < ngroups; gn++)
{
for (int who = 0; who < UnstableMsgs[gn].Length; who++)
{
if (!GotResponseFrom.Contains(UnstableMsgs[gn][who].sender) && UnstableMsgs[gn][who].mid_hi != -1 && UnstableMsgs[gn][who].mid_low != -1)
{
usl[nUnstable++] = UnstableMsgs[gn][who];
}
}
}
return nOracleReplies;
}
// Runs in all participants
internal static void CommitGVUpdates(Group g, ViewDelta[] vds)
{
View v = null;
CommitGVUpdates(g, vds, ref v);
}
internal static void CommitGVUpdates(Group g, ViewDelta[] vds, ref View newView)
{
bool IamOracle = g == ORACLE && g.theView.GetMyRank() != -1;
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
string vs = " ";
foreach (ViewDelta vd in vds)
{
vs += " " + vd + Environment.NewLine;
}
Vsync.Write("CommitViewUpdates<" + g.gname + ">... IAmOracle=" + IAmOracle + ":" + Environment.NewLine + vs);
}
long before = Vsync.NOW;
if (IamOracle)
{
List<ViewDelta> vdsApplied = new List<ViewDelta>();
foreach (ViewDelta vd in vds)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("... CommitGVUpdates(IAmOracle) now applying " + vd);
}
if (vd.gaddr != ORACLE.gaddr)
{
Group tpg = Group.TrackingProxyLookup(vd.gaddr) ?? Group.TrackingProxy(vd.gname, "Commit GVUPdates", vd.gaddr, vd.tsig, vd.mcmdmap, new View(vd.gname, vd.gaddr, vd.joiners, vd.prevVid + 1, false), vd.isLarge ? Group.G_ISLARGE : 0, false);
UpdateGroupView(true, vd, tpg, "TrackingProxy", ref newView);
}
else
{
UpdateGroupView(true, vd, g, "ORACLE:self-update", ref newView);
}
vdsApplied.Add(vd);
}
using (var tmpLockObj = new LockAndElevate(GVELock))
{
List<GVEvent> newGVEList = new List<GVEvent>();
foreach (GVEvent gve in GVEList)
{
bool fnd_all = true;
foreach (Address gaddr in gve.gaddrs)
{
bool fnd = false;
foreach (ViewDelta vd in vdsApplied)
{
if (vd.gaddr == gaddr && (gve.request == JOIN ? vd.joiners.Contains(gve.who) : vd.leavers.Contains(gve.who)))
{
fnd = true;
break;
}
}
if (!(fnd_all &= fnd))
{
break;
}
}
if (!fnd_all)
{
newGVEList.Add(gve);
}
}
GVEList = newGVEList;
}
}
else
{
foreach (ViewDelta vd in vds)
{
if (vd.gaddr == g.gaddr)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.GVELOGIC)) != 0)
{
Vsync.WriteLine("... CommitGVUpdates now applying " + vd);
}
UpdateGroupView(true, vd, g, "VsyncGroups", ref newView);
}
}
}
if ((VsyncSystem.Debug & VsyncSystem.DELAYS) != 0 && (Vsync.NOW - before) > Vsync.VSYNC_WARNAFTER)
{
Vsync.WriteLine("WARNING: LONG DELAY while calling UpdateGroupView (" + (Vsync.NOW - before) + "ms)");
}
}
// Called with GroupIsReal from CommitGVUpdates
// Called with GroupIsReal false when computing a "working view" in the token tree manager
internal static void UpdateGroupView(bool GroupIsReal, ViewDelta vd, Group g, string queue)
{
View newView = null;
UpdateGroupView(GroupIsReal, vd, g, queue, ref newView, false);
}
internal static void UpdateGroupView(bool GroupIsReal, ViewDelta vd, Group g, string queue, ref View nv)
{
UpdateGroupView(GroupIsReal, vd, g, queue, ref nv, false);
}
// Note that inhibitActions is only used with GroupIsReal set to false...
internal static void UpdateGroupView(bool GroupIsReal, ViewDelta vd, Group g, string queue, ref View nv, bool inhibitActions)
{
if (g != Vsync.ORACLE && vd.gaddr != g.gaddr)
{
return;
}
if (GroupIsReal)
{
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
nv = g.theView;
}
}
Address[] joiners = Expand(vd.joiners);
Address[] leavers = Expand(vd.leavers);
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("UpdateGroupView<" + g.gname + (GroupIsReal ? string.Empty : " TrackingProxy") + ">: viewdelta= " + vd + ", expanded joiners = " + Address.VectorToString(joiners) + "; leavers " + Address.VectorToString(leavers));
}
bool IAmLeaving = false;
List<Address> newView = new List<Address>();
if (g.theView == null)
{
throw new VsyncException("g.theView null in UpdateGroupView");
}
if (GroupIsReal)
{
if (g.theView.viewid > vd.prevVid)
{
if (vd.prevVid != -1 && (VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("WARNING: UpdateGroupView ignoring a superfluous view update because <" + g.gname + "> is on view " + g.theView.viewid + ", but vd.prevVid is " + vd.prevVid + " in state " + VsyncSystem.GetState());
}
return;
}
if (g.theView.viewid != vd.prevVid)
{
if ((g.flags & Group.G_ISLARGE) == 0 || g.theView.viewid <= vd.prevVid)
{
if (g.theView.viewid > 0)
{
// Viewid can "reset" if all members leave; this case arises if a concurrent join was underway...
Vsync.WriteLine("WARNING... in CommitGVUpdates: <" + g.gname + ">: updating wrong view! (Expected " + vd.prevVid + ", found " + g.theView.viewid + ")... group was on " + queue + Environment.NewLine + "View Delta:" + vd + Environment.NewLine);
}
g.GotLastSeqns(vd.prevVid, new int[g.theView.members.Length]);
}
else
{
return;
}
}
else
{
g.GotLastSeqns(vd.prevVid, vd.lastSeqns);
}
}
int lcnt = 0;
int jcnt = 0;
int ntotal = g.theView.members.Length;
Address[] validatedLeavers = new Address[leavers.Length];
Address[] validatedJoiners = new Address[joiners.Length + ((g.myTargetGroupSize > 0 && ntotal > g.myTargetGroupSize) ? leavers.Length : 0)];
for (int i = 0; i < ntotal; i++)
{
Address a = g.theView.members[i];
if (leavers.Contains(a))
{
validatedLeavers[lcnt++] = a;
if (a.isMyAddress())
{
IAmLeaving = true;
}
if (g.myTargetGroupSize > 0 && ntotal > g.myTargetGroupSize)
{
while (ntotal > i && leavers.Contains(g.theView.members[ntotal - 1]))
{
--ntotal;
}
if (ntotal > g.myTargetGroupSize)
{
Address who = g.theView.members[--ntotal];
newView.Add(who);
validatedJoiners[jcnt++] = who;
}
}
}
else
{
newView.Add(a);
}
}
foreach (Address a in joiners)
{
if (!newView.Contains(a))
{
newView.Add(a);
validatedJoiners[jcnt++] = a;
}
}
if (vd.prevVid == -1 && !validatedJoiners.Contains(Vsync.my_address))
{
validatedJoiners[jcnt++] = Vsync.my_address;
}
if (lcnt != validatedLeavers.Length)
{
Vsync.ArrayResize(ref validatedLeavers, lcnt);
}
if (jcnt != validatedJoiners.Length)
{
Vsync.ArrayResize(ref validatedJoiners, jcnt);
}
if (newView.Count == 0 && g.isTrackingProxy)
{
g.GroupClose();
using (var tmpLockObj = new LockAndElevate(Group.TPGroupsLock))
{
Group.TPGroups.Remove(g.gaddr);
}
return;
}
View nextView = new View(vd.gname, vd.gaddr, newView.ToArray(), vd.prevVid + 1, (g.flags & Group.G_ISLARGE) != 0);
if (!g.isTrackingProxy)
{
nextView.joiners = validatedJoiners;
nextView.offset = vd.offset;
nextView.leavers = validatedLeavers;
}
if ((VsyncSystem.Debug & VsyncSystem.VIEWCHANGE) != 0)
{
Vsync.WriteLine("UpdateGroupView<" + g.gname + (GroupIsReal ? string.Empty : " TrackingProxy") + ">: nextView = " + nextView + ", InhibitActions=" + inhibitActions);
}
if (inhibitActions)
{
nv = nextView;
return;
}
if (GroupIsReal)
{
if ((g.flags & Group.G_ISLARGE) != 0 && g.theView != null)
{
nextView.NextIncomingMsgID[1] = g.theView.NextIncomingMsgID[1];
}
long before = Vsync.NOW;
g.NewView(nextView, queue, null, ref nv);
if ((VsyncSystem.Debug & VsyncSystem.DELAYS) != 0 && (Vsync.NOW - before) > Vsync.VSYNC_WARNAFTER)
{
Vsync.WriteLine("WARNING: g.NewView delayed for " + (Vsync.NOW - before) + "ms");
}
if (!IAmLeaving)
{
g.ReplayToDo();
}
}
else
{
// Tracking Proxy in the ORACLE
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
g.theView = nextView;
nv = nextView;
// Remember the value just in case we "later" need to use it to initialize a joining member
nv.nextMsgid = g.nextMsgid;
}
}
if (nextView.leavers.Length > 0)
{
ReliableSender.PendingSendCleanup(g, nextView.leavers);
}
}
internal static Address[] Expand(Address[] list)
{
Group.vGroup vg = null;
bool mustRevise = false;
int newLen = list.Length;
foreach (Address a in list)
{
if (a.pid < 0 && (vg = Group.vGLookup(a)) != null)
{
mustRevise = true;
newLen = newLen - 1 + vg.vGMembers.Length;
}
}
if (!mustRevise)
{
return list;
}
Address[] newList = new Address[newLen];
int idx = 0;
foreach (Address address in list)
{
if (address.pid < 0 && ((vg != null && vg.vGroupAddr == address) || (vg = Group.vGLookup(address)) != null))
{
foreach (Address ma in vg.vGMembers)
{
newList[idx++] = ma;
}
}
else
{
newList[idx++] = address;
}
}
return newList;
}
}
/// <summary>
/// A class containing various static definitions and methods used by the Vsync^2 system.
/// </summary>
public static class VsyncSystem
{
internal static bool VsyncActive = false;
internal static long StartedAt;
/// <summary>
/// A getter method for VsyncActive
/// </summary>
/// <returns>True if the library is active, false if not</returns>
public static bool VsyncIsActive()
{
return VsyncActive;
}
internal static bool VsyncWasActive = false;
internal static bool VsyncAlreadyRan = false;
internal static bool VsyncRestarting = false;
internal static bool fastStart = false;
internal static int VsyncJoinCounter = 0;
internal const long GROUPEVENTS = 0x0000000000000001;
internal const long MESSAGELAYER = 0x0000000000000002;
internal const long REPLYWAIT = 0x0000000000000004;
internal const long VIEWCHANGE = 0x0000000000000008;
internal const long FRAGER = 0x0000000000000010;
internal const long TYPESIGS = 0x0000000000000020;
internal const long RELAYLOGIC = 0x0000000000000040;
internal const long GVELOGIC = 0x0000000000000080;
internal const long P2PLAYER = 0x0000000000000100;
internal const long INTERFACES = 0x0000000000000200;
internal const long LOCKSTATE = 0x0000000000000400;
internal const long MCMD = 0x0000000000000800;
internal const long STARTSEQ = 0x0000000000001000;
internal const long VIEWWAIT = 0x0000000000002000;
internal const long CALLBACKS = 0x000000000004000;
internal const long FAILURES = 0x0000000000008000;
internal const long DISCARDS = 0x0000000000010000;
internal const long DALLOGIC = 0x0000000000020000;
internal const long TOKENLOGIC = 0x0000000000040000;
internal const long TOKENFLUSH = 0x0000000000080000;
internal const long FLUSHING = 0x0000000000100000;
internal const long VERBOSEADDRS = 0x0000000000200000;
internal const long MSGIDS = 0x0000000000400000;
internal const long AGGREGATION = 0x0000000000800000;
internal const long LOCKCHECK = 0x0000000001000000;
internal const long MCMDMAP = 0x0000000002000000;
internal const long LOOPBACK = 0x0000000004000000;
internal const long PPAYLOADS = 0x0000000008000000;
internal const long DELIVERY = 0x0000000010000000;
internal const long TUNNELING = 0x0000000040000000;
internal const long TIMERS = 0x0000000080000000;
internal const long FLOWCONTROL = 0x0000000100000000;
internal const long CIPHER = 0x0000000200000000;
internal const long DHTS = 0x0000000400000000;
internal const long NACKS = 0x0000000800000000;
internal const long DELAYS = 0x0000001000000000;
internal const long LOWLEVELMSGS = 0x0000002000000000;
/* Caution: very verbose, uses in-memory storage. Must call VsyncSystem.WriteAckLog periodically or will leak memory */
internal const long TOKENSTABILITY = 0x0000004000000000;
internal const long CAUSALDELIVERY = 0x0000008000000000;
internal const long DISKLOGGER = 0x0000010000000000;
internal const long MSGQS = 0x0000020000000000;
internal const long WARNIFSLOW = 0x0000040000000000;
internal const long SPECIALDEBUG = 0x0000080000000000;
internal const long LOCKS = 0x0000100000000000;
internal const long SAFESEND = 0x0000200000000000;
internal const long ORDEREDSEND = 0x0000400000000000;
internal const long WARNABOUTSUBSETS = 0x0000800000000000;
internal const long OOBXFERS = 0x0001000000000000;
internal const long PENDINGSENDS = 0x0002000000000000;
internal const long IBDB = 0x0004000000000000;
internal const long TOKENLAYER = FLUSHING | TOKENFLUSH | TOKENLOGIC;
internal const long Debug = 0L;
/// <summary>
/// Returns this process's address.
/// </summary>
/// <returns></returns>
public static Address GetMyAddress()
{
if (!VsyncActive)
{
throw new VsyncException("Vsync isn't running (did you forget to call Vsync.Start()?)");
}
return Vsync.my_address;
}
/// <summary>
/// An interface for obtaining a copy of the NULLADDRESS object
/// </summary>
/// <returns></returns>
public static Address GetNullAddress()
{
return Vsync.NULLADDRESS;
}
/// <summary>
/// Returns a pretty-printable string giving the internal state of Vsync on this node.
/// </summary>
/// <returns></returns>
public static string GetState()
{
GC.Collect();
if (!VsyncSystem.VsyncActive || Vsync.my_address == null)
{
return "VSYNC is inactive";
}
string state = Environment.NewLine + "----------------------- VSYNC state for " + Vsync.my_address.ToStringVerboseFormat() +
(Vsync.ClientOf == null ? " [leaderId=" + Vsync.LeaderId + "]" : string.Empty) + ":" + Environment.NewLine;
for (int sc = 0; sc < 14; sc++)
{
try
{
switch (sc)
{
case 0:
state += RunTimeStatsState();
break;
case 1:
state += Group.GetState();
break;
case 2:
if (Vsync.VSYNC_INFINIBAND)
{
state += IB.GetState();
}
break;
case 3:
state += Vsync.GetTimerState();
break;
case 4:
state += Vsync.RIPListState();
break;
case 5:
state += Vsync.GetPLLState();
break;
case 6:
state += Vsync.GetGVEState();
break;
case 7:
state += AwaitReplies.GetState();
break;
case 8:
state += FlowControl.GetState();
break;
case 9:
state += ReliableSender.GetState();
break;
case 10:
state += Group.deFragState();
break;
case 11:
state += MCMDSocket.GetState();
break;
case 12:
state += ILock.GetThreadStates() + ILock.GetState();
break;
case 13:
state += BoundedBuffer.GetState();
break;
}
}
catch (Exception e)
{
string[] what = { "RunTimeStates", "Group", "Timer", "RIPList", "PLL", "GVE", "Rdv", "FC", "ReliableSender", "deFrag", string.Empty, "Tunnels", "MCMDSocket", "Ilock", "BB" };
state += "Vsync threw an internal exception < " + e + "> while trying to perform " + what[sc] + ".GetState()";
}
}
state += "-------------------------End of State Dump------------------------------------" + Environment.NewLine;
return state;
}
/// <summary>
/// Returns a string summarizing the state of the Vsync I/O subsystem
/// </summary>
/// <returns></returns>
public static string GetIOState()
{
return RunTimeStatsState();
}
internal static void WriteAckInfo()
{
if ((VsyncSystem.Debug & VsyncSystem.LOWLEVELMSGS) != 0)
{
using (var tmpLockObj = new LockAndElevate(ReliableSender.ackInfoLock))
{
foreach (string aks in ReliableSender.ackInfo)
{
Vsync.Write(aks);
}
ReliableSender.ackInfo = new List<string>();
}
}
}
/// <summary>
/// Runtime statistics for this version of Vsync
/// </summary>
public class RuntimeStats
{
/// <summary>
/// UDP packets sent
/// </summary>
public long UDPsent;
/// <summary>
/// UDP bytes sent
/// </summary>
public long UDPBsent;
/// <summary>
/// UDP packets received
/// </summary>
public long UDPrcvd;
/// <summary>
/// UDP bytes received
/// </summary>
public long UDPBrcvd;
/// <summary>
/// IPMC packets sent
/// </summary>
public long IPMCsent;
/// <summary>
/// IPMC bytes sent
/// </summary>
public long IPMCBsent;
/// <summary>
/// IPMC packets received
/// </summary>
public long IPMCrcvd;
/// <summary>
/// IPMC bytes received
/// </summary>
public long IPMCBrcvd;
/// <summary>
/// IPMC packets sent
/// </summary>
public long IBsent;
/// <summary>
/// IPMC bytes sent
/// </summary>
public long IBBsent;
/// <summary>
/// IPMC packets received
/// </summary>
public long IBrcvd;
/// <summary>
/// IPMC bytes received
/// </summary>
public long IBBrcvd;
/// <summary>
/// Tokens sent
/// </summary>
public long TokensSent;
/// <summary>
/// Tokens received
/// </summary>
public long TokensRcvd;
/// <summary>
/// Acks sent
/// </summary>
public long ACKsent;
/// <summary>
/// Acks received
/// </summary>
public long ACKrcvd;
/// <summary>
/// Nacks sent
/// </summary>
public long NACKsent;
/// <summary>
/// Nacks received
/// </summary>
public long NACKrcvd;
/// <summary>
/// Stability IPMC sent
/// </summary>
public long StabilitySent;
/// <summary>
/// Stability IPMC received
/// </summary>
public long StabilityRcvd;
/// <summary>
/// Token-triggered retransmissions
/// </summary>
public long TTRet;
/// <summary>
/// Total packets discarded as dups or for having no local receiver
/// </summary>
public long Discarded;
internal LockObject Lock = new LockObject(false, "RTS.Lock");
internal long rcvProcessingBeganAt;
internal long ackedAt;
internal long ackProcessingBeganAt;
#if WARN_ON_LONG_DELAYS
internal long checkedAt;
#endif
internal int[] ThreadCntrs = new int[36];
internal void check()
{
#if WARN_ON_LONG_DELAYS
if (Vsync.NOW - VsyncSystem.StartedAt < 20000 || Vsync.NOW - this.checkedAt < 1000)
{
return;
}
this.checkedAt = Vsync.NOW;
long rdelay = 0;
long a0delay = 0;
long a1delay = 0;
using (var tmpLockObj = new LockAndElevate(this.Lock))
{
if (RTS.rcvProcessingBeganAt > 0)
{
rdelay = Vsync.NOW - RTS.rcvProcessingBeganAt;
}
if (RTS.ackedAt > 0)
{
a0delay = Vsync.NOW - RTS.ackedAt;
}
if (RTS.ackProcessingBeganAt > 0)
{
a1delay = Vsync.NOW - RTS.ackProcessingBeganAt;
}
}
if (rdelay > Vsync.VSYNC_WARNAFTER)
{
if (rdelay > Vsync.VSYNC_DEFAULTTIMEOUT * 3)
{
throw new VsyncException("EXCEPTION: receive thread has been processing a received message for " + rdelay);
}
if ((VsyncSystem.Debug & VsyncSystem.WARNIFSLOW) != 0)
{
Vsync.WriteLine("WARNING: receive thread has been processing a received message for " + rdelay);
}
}
if (a0delay > Vsync.VSYNC_WARNAFTER)
{
if (a0delay > Vsync.VSYNC_DEFAULTTIMEOUT * 3)
{
throw new VsyncException("EXCEPTION: ACK socket-handler thread has been trying to put a received ack on the ACKBB for " + a0delay + "ms");
}
if ((VsyncSystem.Debug & VsyncSystem.WARNIFSLOW) != 0)
{
Vsync.WriteLine("WARNING: ACK socket-handler thread has been trying to put a received ack on the ACKBB for " + a0delay + "ms");
}
}
if (a1delay > Vsync.VSYNC_WARNAFTER)
{
if (a1delay > Vsync.VSYNC_DEFAULTTIMEOUT * 3)
{
throw new VsyncException("EXCEPTION: ACK processing thread has been processing a received ack/nack for " + a1delay + "ms");
}
if ((VsyncSystem.Debug & VsyncSystem.WARNIFSLOW) != 0)
{
Vsync.WriteLine("WARNING: ACK processing thread has been processing a received ack/nack for " + a1delay);
}
}
#endif
}
}
internal static bool noConsole;
/// <summary>
/// System statistics
/// </summary>
public static RuntimeStats RTS = new RuntimeStats();
internal static string RunTimeStatsState()
{
string s = "Summary of network statistics:" + Environment.NewLine + " ";
using (var tmpLockObj = new LockAndElevate(VsyncSystem.RTS.Lock))
{
s += "SENT: " + RTS.UDPsent + " UDP (" + RTS.UDPBsent + " non-duplicated bytes), " + RTS.TokensSent + " tokens, " + RTS.IPMCsent + " IPMC (" + RTS.IPMCBsent + " bytes; " + RTS.StabilitySent + " were stability packets), ";
s += RTS.ACKsent + " Acks, " + RTS.NACKsent + " Nacks, " + RTS.Discarded + " Discards." + Environment.NewLine + " ";
s += "RECV: " + RTS.UDPrcvd + " UDP (" + RTS.Discarded + " were dups; " + RTS.UDPBrcvd + " bytes), " + RTS.TokensRcvd + " tokens, " + RTS.IPMCrcvd + " IPMC (" + RTS.IPMCBrcvd + " bytes; " + RTS.StabilityRcvd + " were stability packets), ";
s += RTS.Discarded + " were dups, " + RTS.ACKrcvd + " Acks, " + RTS.NACKrcvd + " Nacks, " + RTS.TTRet + " token-triggered resends" + Environment.NewLine;
if (Vsync.VSYNC_USERDMA)
{
s += " RDMA: " + RTS.IBsent + " RDMA sends (" + RTS.IBBsent + " bytes), " + RTS.IBrcvd + " receives (" + RTS.IBBrcvd + " bytes)" + Environment.NewLine;
}
}
return s;
}
/// <summary>
/// Prints a line to the console and to the log
/// </summary>
public static void WriteLine(string s)
{
Vsync.WriteLine(s);
}
/// <summary>
/// Prints a word to the console and to the log
/// </summary>
public static void Write(string s)
{
Vsync.Write(s);
}
internal static Thread ParentThread;
/// <summary>
/// Called to the Vsync subsystem. Must be your first call to Vsync.
/// </summary>
public static void Start()
{
Start(false, false);
}
/// <summary>
/// Called to Start the Vsync subsystem. Must be your first call to Vsync.
/// </summary>
/// <param name="fast">If true, starts without searching for an already active Vsync ORACLE</param>
public static void Start(bool fast)
{
Start(false, fast);
}
/// <summary>
/// Called to Start the Vsync subsystem. Must be your first call to Vsync.
/// </summary>
/// <param name="fast">If true, starts without searching for an already active Vsync ORACLE</param>
/// <param name="noConsole">If true, Vsync won't write to the Console (it will generate a log and will write to the debug stream, if needed)</param>
public static void Start(bool noConsole, bool fast)
{
if (!Stopwatch.IsHighResolution)
{
throw new VsyncException("Vsync requires access to a high-resolution timer.");
}
if (VsyncActive && !Vsync.WORKER_MODE)
{
throw new VsyncException("Vsync can't be started multiple times.");
}
if (VsyncAlreadyRan)
{
throw new VsyncException("Can't restart Vsync after shutdown");
}
fastStart = fast;
VsyncSystem.noConsole = noConsole;
ParentThread = Thread.CurrentThread;
#if PROTOCOL_BUFFERS
RuntimeTypeModel.Default.AllowParseableTypes = true;
#endif
StartedAt = Vsync.NOW;
Thread t = new Thread(() =>
{
// .NET has a peculiar behavior when the class loader is still active: threads can be
// created but they don't really run. To work around that, Vsync Start() itself runs in
// a separate thread. If this thread can start, other threads can start too!
try
{
Vsync.VSYNC_DEFAULTTIMEOUT <<= 1;
VsyncWasActive = VsyncActive = true;
VsyncAlreadyRan = true;
VsyncRestarting = true;
Vsync.timer_thread = new Thread(Vsync.TimerThread) { Priority = ThreadPriority.Highest, Name = "Vsync timer callback thread", IsBackground = true };
Vsync.timer_thread.Start();
ReliableSender.ResenderThreadLaunch();
if (Vsync.my_IPaddress == null)
{
Vsync.my_IPaddress = Vsync.setMyAddress();
}
if (!Vsync.WORKER_MODE)
{
WaitUntilVsyncIsRunning(true);
if (VsyncRestarting || !VsyncActive)
{
throw new VsyncException("Vsync Start Failed(VsyncRestarting=" + VsyncRestarting + ", VsyncActive=" + VsyncActive + ")");
}
}
if (Vsync.VSYNC_LOGGED)
{
bool vsync_mute = Vsync.VSYNC_MUTE;
Vsync.VSYNC_MUTE = true;
Vsync.WriteLine("VSYNC process-id: " + Vsync.my_pid + ", my_address = " + Vsync.my_address);
Vsync.VSYNC_MUTE = vsync_mute;
}
Vsync.SetupIM();
WaitUntilVsyncIsRunning(false);
if (Vsync.VSYNC_INFINIBAND)
{
IB.Setup();
}
Vsync.VSYNC_DEFAULTTIMEOUT >>= 1;
if (Vsync.WORKER_MODE)
{
Vsync.VSYNCMEMBERS.Watch[Vsync.MY_MASTER] += ev => { throw new VsyncException("master termination"); };
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "VsyncSystem.Start() initialization thread.", IsBackground = true };
t.Start();
t.Join();
if (Vsync.VSYNCMEMBERS != null && Vsync.VSYNCMEMBERS.HasFirstView && Vsync.VSYNCMEMBERS.GetMyRank() > Vsync.VSYNC_MAXSYSTEMSIZE)
{
Vsync.WriteLine("WARNING: This version of Vsync has only been tested with up to " + Vsync.VSYNC_MAXSYSTEMSIZE + " members. Exceeding this threshold may trigger bugs or performance anomalies!");
}
}
/// <summary>
/// Gets the time since the system was started in milliseconds.
/// </summary>
/// <returns>The time since the system was started in milliseconds.</returns>
public static long NOW
{
get
{
return Vsync.NOW;
}
}
/// <summary>
/// Gets the time that VSYNC was started in milliseconds.
/// </summary>
/// <returns>The time that VSYNC was started in milliseconds.</returns>
public static long STARTED
{
get
{
return StartedAt;
}
}
/// <summary>
/// Gets the precise time since the system was started in ticks (100 nanoseconds).
/// </summary>
/// <returns>The precise time since the system was started in ticks (100 nanoseconds).</returns>
public static long TICKS
{
get
{
return Vsync.TICKS;
}
}
private static int SafelyGetMyRank(Group g)
{
if (g == null || !g.HasFirstView)
{
return -1;
}
return g.theView.GetMyRank();
}
private static void WaitUntilVsyncIsRunning(bool justOracle)
{
int howLong = Vsync.WORKER_MODE ? (15 * 60) : (2 * 60); // Seconds: Worker will wait 15 minutes; others 2
Vsync.OK_TO_SEND_WORKER_REQ = true;
for (int n = 0; n < howLong && (VsyncActive || !VsyncWasActive); n++)
{
if (VsyncActive && (Vsync.ClientOf != null || SafelyGetMyRank(Vsync.ORACLE) != -1) && (justOracle || (SafelyGetMyRank(Vsync.VSYNCMEMBERS) != -1 && ReliableSender.ResenderThread != null)))
{
return;
}
Vsync.Sleep(1000);
}
throw new VsyncException("Vsync Start Failed");
}
/// <summary>
/// Declares to Vsync that this process will run as a "worker" to a master process elsewhere in the datacenter.
/// </summary>
/// <param name="myMaster">Address of the "master" process that will call BatchStart(), in a verbose-format string (<see cref="Address.ToStringVerboseFormat()"/>,
/// or IP Address of the machine the master is on, in standard IPv4 notation: xxxx.xx.xxx.xxx</param>
/// <param name="timeoutms">Timeout in ms (VSYNCDEFAULTTIMEOUT if not specified)</param>
/// <returns>The address the master should use to activate the worker, via a call to BatchStart().</returns>
/// <remarks>When using RunAsWorker(), the workers will typically be launched on some large number of nodes, passing the Address of their
/// master in as an argument to RunAsWorker(). The workers then call Vsync.Start() and block until the master activates them by calling BatchStart().
/// The master, in contrast, calls Vsync.Start() and then RegisterAsMaster(). It launches the workers, giving them its Vsync Address as an argument, and
/// then waits until all the workers have started up; as each one calls RunAsWorker() the master receives an upcall providing the Address of that worker.
/// When the master believes that all workers are running, it calls BatchStart() and this releases the workers, which will be waiting in Vsync.Start().
///
/// If 60 seconds pass and the worker has not yet been "BatchStarted" by the master, it calls Shutdown.
///
/// Notice that there are some apparent race conditions here. Vsync handles them automatically: if calls to RunAsWorker() occur before the master
/// calls RegisterAsMaster(), the incoming worker addresses are held and then upcalls occur once the RegisterAsWorker() method is finally called.
///
/// Vsync provides no help on "how long to wait" for workers to start, but beware of problems in which you try to launch 1000 workers but somehow, only 992
/// launch successfully. The master won't receive 1000 callbacks no matter how long it waits and in such cases, your code just needs to handle the
/// situation. On most data centers, a 30 second wait should be long enough: if a worker hasn't registered by then, it probably won't. The master
/// can call RejectWorker() if it gets an upcall to its NewWorker callback after it no longer wants to receive additional such calls.
/// </remarks>
public static void RunAsWorker(string myMaster, int timeoutms)
{
Vsync.WORKER_MODE = true;
VsyncSystem.VsyncActive = true;
Vsync.MY_MASTER = new Address(myMaster);
Vsync.my_IPaddress = Vsync.setMyAddress();
new Thread(() =>
{
while (!Vsync.OK_TO_SEND_WORKER_REQ)
{
Vsync.Sleep(250);
}
ReliableSender.SendP2P(Msg.RUNASWORKER, Vsync.MY_MASTER, null, new byte[0], true);
Vsync.OnTimer(timeoutms, () =>
{
if (Vsync.VSYNCMEMBERS == null || !Vsync.VSYNCMEMBERS.HasFirstView || Vsync.VSYNCMEMBERS.theView.GetMyRank() == -1)
{
Shutdown("RunAsWorker: taking too long");
}
});
}) { Name = "RunAsWorker launcher thread", IsBackground = true }.Start();
}
/// <summary>
/// Overload for RunAsWorker(string myMaster, int timeoutms)
/// </summary>
/// <param name="myMaster">Vsync.Address of leader, or IP Address of machine the leader is on in standard xxxx.xx.xxx.xxx notation</param>
public static void RunAsWorker(string myMaster)
{
RunAsWorker(myMaster, Vsync.VSYNC_DEFAULTTIMEOUT * 10);
}
internal static NewWorker MasterCallBack;
/// <summary>
/// The master uses this to register a callback that will be invoked once for each worker that calls RunAsWorker(), giving the Address of that worker
/// Later when the master terminates, all its workers throw VsyncException("Master has terminated").
/// </summary>
/// <param name="del"></param>
public static void RegisterAsMaster(NewWorker del)
{
MasterCallBack = del;
ReliableSender.ReplaySavedWorkers();
}
/// <summary>
/// Rejects a RunAsWorker() request. The worker will throw an VsyncException("received poison");
/// </summary>
/// <param name="who">The rejected worker's Address</param>
public static void RejectWorker(Address who)
{
ReliableSender.SendPoison(who, Vsync.my_address + " has rejected you as a worker");
}
/// <summary>
/// Blocks until Vsync terminates on this node, then throws an exception. Used mostly in a dedicated ORACLE process:
/// such a process calls Vsync.Start() and then Vsync.WaitForever().
/// </summary>
/// <remarks>When Vsync terminates this will throw an exception, probably just VsyncException("Shutdown") but perhaps something
/// more colorful. Until then, sits here, perhaps forever.</remarks>
public static void WaitForever()
{
while (VsyncSystem.VsyncActive)
{
Vsync.Sleep(15000);
}
}
/// <summary>
/// Used by a master process to activate a potentially long list of worker processes, which should call RunAsWorker(), pass their addresses to the master, and then block on calls to Vsync.Start().
/// The master process itself should first call Vsync.Start() and only calls BatchStart() after collecting the addresses of the workers, usually via SendP2P().
/// </summary>
/// <remarks>When using RunAsWorker(), the workers will typically be launched on some large number of nodes, obtaining their addresses via RunAsWorker() and then
/// passing these to the Master(), for example by appending them to a file that the master will read, passing them over a TCP link, or using the Vsync SendP2P() API.
/// The workers then call Vsync.Start() and block until the master activates them by calling BatchStart().</remarks>
public static void BatchStart(Address[] workers)
{
Address OracleLeader = Vsync.ClientOf;
OracleLeader = OracleLeader ?? Vsync.ORACLE.theView.members[0];
foreach (Address worker in workers)
{
Vsync.SendInitialOracleLeaderInfo(worker, OracleLeader);
}
Group.multiJoin(workers, new[] { Vsync.VSYNCMEMBERS });
// Just in case, make sure all are listed in VSYNCMEMBERS before returning
foreach (Address worker in workers)
{
int stry = 0;
while (stry++ < 5 && Vsync.VSYNCMEMBERS.GetRankOf(worker) == -1)
{
Vsync.Sleep(1000);
}
if (stry == 5)
{
throw new VsyncException("BatchStart: Even after 5 seconds, worker " + worker + " wasn't listed in VSYNCMEMBERS");
}
}
}
/// <summary>
/// Master calls this to wait for workers to call WorkerSetupDone
/// </summary>
/// <param name="workers">worker processes</param>
public static void WaitForWorkerSetup(List<Address> workers)
{
WaitForWorkerSetup(workers.ToArray());
}
/// <summary>
/// Master calls this to wait for workers to call WorkerSetupDone
/// </summary>
/// <param name="workers">worker processes</param>
public static void WaitForWorkerSetup(Address[] workers)
{
foreach (Address worker in workers)
{
try
{
Vsync.VSYNCMEMBERS.P2PQuery(worker, new Timeout(120000, Timeout.TO_ABORTREPLY, "INQUIRE"), Vsync.INQUIRE);
}
catch (VsyncAbortReplyException e)
{
throw new VsyncException("Timeout on during WaitForWorkerSetup for worker " + worker, e);
}
}
}
internal static Semaphore waitForWorkerSetup = new Semaphore(0, 1);
/// <summary>
/// Worker calls this when any setup that needed to occur after Vsync.Start is completed
/// </summary>
public static void WorkerSetupDone()
{
waitForWorkerSetup.Release();
}
/// <summary>
/// Used by a master process to activate a potentially long list of worker processes, which should call RunAsWorker(), pass their addresses to the master, and then block on calls to Vsync.Start().
/// The master process itself should first call Vsync.Start() and only calls BatchStart() after collecting the addresses of the workers, usually via SendP2P().
/// </summary>
/// <remarks>When using RunAsWorker(), the workers will typically be launched on some large number of nodes, obtaining their addresses via RunAsWorker() and then
/// passing these to the Master(), for example by appending them to a file that the master will read, passing them over a TCP link, or using the Vsync SendP2P() API.
/// The workers then call Vsync.Start() and block until the master activates them by calling BatchStart().</remarks>
public static void BatchStart(List<Address> workers)
{
BatchStart(workers.ToArray());
}
internal static void CheckParentThread()
{
if (VsyncActive && !ParentThread.IsAlive)
{
Shutdown("Vsync Shutdown: Parent thread has exited");
}
}
/// <summary>
/// Shuts Vsync down. In this release, restarting Vsync is not supported. Sorry!
/// </summary>
public static void Shutdown()
{
Shutdown(null);
}
/// <summary>
/// Allows you to tell Vsync that some process is dead. Ignored if the caller isn't a member of some group
/// to which that process belongs
/// </summary>
/// <param name="who"></param>
public static void ProcessFailed(Address who)
{
List<Group> theGroups = Group.VsyncGroupsClone();
foreach (Group g in theGroups)
{
if (g.GetRankOf(who) != -1)
{
Vsync.NodeHasFailed(who, "application logic sensed failure", false);
return;
}
}
}
internal static volatile bool shuttingDown = false;
internal static object shutdownLock = new object();
/// <summary>
/// Shuts Vsync down. In this release, restarting Vsync is not supported. Sorry!
/// Calls AppDomain.Unload(AppDomain.CurrentDomain); unless you set VSYNC_GRACEFULSHUTDOWN to true.
/// </summary>
/// <param name="s"></param>
public static void Shutdown(string s)
{
lock (shutdownLock)
{
if (shuttingDown)
{
return;
}
shuttingDown = true;
if (!VsyncSystem.VsyncActive && VsyncSystem.VsyncWasActive)
{
return;
}
}
if (s != null)
{
Vsync.WriteLine(s);
}
Dictionary<Address, Group> GClone = new Dictionary<Address, Group>();
using (var tmpLockObj = new LockAndElevate(Group.VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in Group.VsyncGroups)
{
GClone.Add(kvp.Key, kvp.Value);
}
}
if (Vsync.ClientOf != null)
{
FDInform(Vsync.ClientOf);
}
else if (Vsync.ORACLE != null && Vsync.ORACLE.HasFirstView)
{
foreach (Address a in Vsync.ORACLE.theView.members)
{
if (!a.isMyAddress())
{
FDInform(a);
break;
}
}
}
VsyncActive = false;
if (!Vsync.VSYNC_GRACEFULSHUTDOWN)
{
AbandonShip();
}
else
{
/* The code that follows tries to do a graceful shutdown of Vsync, but has chronic "issues"... */
try
{
try
{
foreach (KeyValuePair<Address, Group> kvp in GClone)
{
Group g = kvp.Value;
AwaitReplies.InterruptReplyWaits(g);
g.InterruptAggregationWaits();
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
try
{
BoundedBuffer.ShutDown();
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
try
{
ReliableSender.Shutdown();
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
LockAndElevate.Disable(true);
try
{
MCMDSocket.Shutdown();
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
try
{
ReliableSender.toResendSema.Release();
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
try
{
ILock.Shutdown();
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
try
{
Vsync.my_logstream.Flush();
}
catch (IOException)
{
}
}
catch (Exception e)
{
if (VsyncSystem.noConsole)
{
System.Diagnostics.Debug.WriteLine("While attempting to shut down Vsync, threw " + e);
}
else
{
Console.WriteLine("While attempting to shut down Vsync, threw " + e);
}
}
}
}
internal static void AbandonShip()
{
try
{
AppDomain.Unload(AppDomain.CurrentDomain);
}
catch (CannotUnloadAppDomainException)
{
Environment.Exit(0);
}
}
internal static void CheckLocksHeld()
{
GC.WaitForPendingFinalizers();
if ((VsyncSystem.Debug & VsyncSystem.LOCKSTATE) != 0)
{
string s = LockObject.LocksIHold();
if (s.Equals(" ", StringComparison.Ordinal))
{
return;
}
}
}
internal static void ThreadTerminationMagic()
{
LockObject.ThreadIsTerminating();
}
// Attempt to trigger a very rapid failure notification
internal static void FDInform(Address who)
{
if (Vsync.ClientOf != null)
{
Vsync.ORACLE.doP2PSend(who, true, Vsync.FDETECTION, Vsync.my_address);
}
else
{
Vsync.ORACLE.doSend(false, false, Vsync.FDETECTION, Vsync.my_address);
}
}
internal static void GotPoison(string why)
{
if (shuttingDown)
{
return;
}
VsyncSystem.WriteAckInfo();
Shutdown("Got poison (reason=\"" + why + "\") in " + VsyncSystem.GetState());
throw new VsyncShutdownException("poison");
}
}
/// <summary>
/// Appears in a Query argument list to separate the by-ref "results" parameters from the arguments to the
/// Query method that will be invoked
/// </summary>
public class EOLMarker
{
}
/// <summary>
/// The Client class permits a non-member of a Group to relay requests through a "representative".
/// </summary>
public class Client
{
internal LockObject myRepLock;
internal string gname;
internal long[][] tsigs;
internal Address myRep;
internal static List<Client> Clients = new List<Client>();
internal static LockObject ClientsLock = new LockObject("ClientsLock");
/// <summary>
/// Constructor for a Client stub object associated with the specified group name.
/// </summary>
/// <remarks>
/// A Client request will be satisfied even if the
/// client process would not be allowed to join the actual group due to access control limits, so the group members must protect themselves
/// against access by clients who are not authorized to access the group data.
/// </remarks>
/// <param name="gname">Group name</param>
public Client(string gname)
{
if (!VsyncSystem.VsyncActive || VsyncSystem.VsyncRestarting)
{
throw new VsyncException(" New Client operation was requested but Vsync wasn't running yet");
}
using (var tmpLockObj = new LockAndElevate(ClientsLock))
{
foreach (Client cl in Clients)
{
if (cl.gname.Equals(gname, StringComparison.Ordinal))
{
throw new VsyncClientException("Attempted to created two Client objects with the same group name");
}
}
}
this.gname = gname;
this.myRepLock = new LockObject(gname + ".ClientLock");
if (!this.refreshRep())
{
throw new VsyncClientException("Client(" + gname + "): Group not found");
}
// Disables checking of types on RPC vectored via VSYNCMEMBERS: a hack for the client of a group mode
// This won't cause harm: VSYNCMEMBERS is only called from the Vsync library and usually in type-checked mode
// and the client's actual RPC call is type-checked in the Client class library
Vsync.VSYNCMEMBERS.isClientProxy = true;
using (var tmpLockObj = new LockAndElevate(ClientsLock))
{
Clients.Add(this);
}
}
/// <summary>
/// Check to see if there is an existing Client object for this group, return it if found
/// </summary>
/// <param name="gname">Group name</param>
/// <returns>Client object if any, null if not known</returns>
public static Client Lookup(string gname)
{
using (var tmpLockObj = new LockAndElevate(ClientsLock))
{
foreach (Client cl in Clients)
{
if (cl.gname.Equals(gname, StringComparison.Ordinal))
{
return cl;
}
}
}
return null;
}
internal static void ResetRep(string gname, Address newRep)
{
using (var tmpLockObj = new LockAndElevate(ClientsLock))
{
foreach (Client cg in Clients)
{
if (cg.gname.Equals(gname, StringComparison.Ordinal))
{
using (var tmpLockObj1 = new LockAndElevate(cg.myRepLock))
{
cg.myRep = newRep;
}
return;
}
}
}
}
private bool refreshRep()
{
byte[] ba;
using (var tmpLockObj = new LockAndElevate(this.myRepLock))
{
this.myRep = null;
}
if (Vsync.ClientOf != null)
{
ba = Vsync.ORACLE.doP2PQuery(Vsync.ClientOf, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT, Timeout.TO_FAILURE, "BECLIENT"), Vsync.BECLIENT, this.gname);
if (ba.Length > 0)
{
using (var tmpLockObj = new LockAndElevate(this.myRepLock))
{
this.myRep = (Address)Msg.BArrayToObjects(ba, typeof(Address))[0];
}
}
}
else
{
using (var tmpLockObj = new LockAndElevate(this.myRepLock))
{
this.myRep = SelectHisRep(this.gname);
this.tsigs = GetTSigs(this.gname);
}
}
if (this.myRep != null)
{
ba = Vsync.VSYNCMEMBERS.doP2PQuery(this.myRep, Vsync.BECLIENT, this.gname);
if (ba.Length > 0)
{
this.tsigs = (long[][])Msg.BArrayToObjects(ba, typeof(long[][]))[0];
}
else
{
using (var tmpLockObj = new LockAndElevate(this.myRepLock))
{
this.myRep = null;
}
}
}
return this.myRep != null;
}
internal static int beClientCounter;
internal static Address SelectHisRep(string gname)
{
Address hisRep = Vsync.NULLADDRESS;
Group g = Group.TrackingProxyLookup(gname);
if (g != null && g.HasFirstView)
{
int rank = beClientCounter++ % g.theView.members.Length;
while (rank < g.theView.hasFailed.Length && g.theView.hasFailed[rank])
{
++rank;
}
if (rank != g.theView.hasFailed.Length)
{
hisRep = g.theView.members[rank];
}
}
return hisRep;
}
internal static long[][] GetTSigs(string gname)
{
Group g = Group.Lookup(gname);
if (g == null && Vsync.ClientOf == null)
{
g = Group.TrackingProxyLookup(gname);
}
if (g == null)
{
return new long[0][];
}
long[][] tsvec = new long[g.Handlers.ListofhLists.Length][];
for (int t = 0; t < tsvec.Length; t++)
{
if (g.Handlers.ListofhLists[t] != null && g.allowsClientRequests[t])
{
tsvec[t] = ComputeTSig(g.Handlers.ListofhLists[t]);
}
else
{
tsvec[t] = new long[0];
}
}
return tsvec;
}
internal static long[] ComputeTSig(Group.myHandlers mh)
{
long[] rv = new long[mh.hList.Count];
int idx = 0;
foreach (Group.CallBack cb in mh.hList)
{
ParameterInfo[] pi = cb.cbProc.hisCb.GetType().GetMethod("Invoke").GetParameters();
string s = string.Empty;
foreach (ParameterInfo pinfo in pi)
{
s += pinfo.ParameterType + ":";
}
rv[idx++] = TSHash(s);
}
return rv;
}
/// <summary>
/// Issues a peer-to-peer query to the group representative for this client, returns the result as a byte[] array.
/// </summary>
/// <remarks>
/// P2PQueryToBA issues a peer-to-peer query to the group representative for this client, returns the result as a byte[] array.
/// Normally, one would pass this array to Msg.BArrayToObjects or Msg.InvokeFromBArray. The first argument must be
/// the request identifier and the second argument an Vsync.Timeout() object specifying how long to wait for a reply
/// and what action to take if a timeout occurs.
/// </remarks>
/// <param name="given">Variable-length list specifying request, timeout, parameters.</param>
/// <returns>byte[] vector encoding results, null if unsuccessful</returns>
public byte[] P2PQueryToBA(params object[] given)
{
Timeout timeout;
Group.splitObs(null, out timeout, ref given);
this.CheckTypeSig(given);
byte[] ba = null;
Address rep = null;
int ntries = 0;
while (this.findRep(ref rep, ref ntries) && (ba = Vsync.VSYNCMEMBERS.doP2PQuery(rep, timeout, Vsync.CLIENTWRAPPED, this.gname, (int)given[0], Msg.toBArray(given))).Length == 0)
{
}
return ba;
}
/// <summary>
/// Issues a peer-to-peer query to the group representative for this client, returns 0 on failure, 1 if successful
/// indicating how many replies were received.
/// </summary>
/// <remarks>
/// P2PQuery issues a peer-to-peer query to the group representative for this client, returns an integer value
/// indicating how many replies were received. The first argument must be
/// the request identifier and the second argument an Vsync.Timeout() object specifying how long to wait for a reply
/// and what action to take if a timeout occurs. The replies themselves are stored into variables specified
/// by reference in a list of by-ref reply objects that are separated from the argument list for the request handler
/// by a special EOLMarker
/// </remarks>
/// <param name="given">Variable-length list specifying request, timeout, parameters, then an EOLMarker, then vectors in which to store results.</param>
/// <returns>0 if unsuccessful, 1 on success</returns>
public int P2PQuery(params object[] given)
{
Timeout timeout;
object[] resRefs;
Group.splitObs(null, out timeout, ref given, out resRefs);
this.CheckTypeSig(given);
byte[] ba = null;
Address rep = null;
int ntries = 0;
while (this.findRep(ref rep, ref ntries) && (ba = Vsync.VSYNCMEMBERS.doP2PQuery(rep, timeout, Vsync.CLIENTWRAPPED, this.gname, (int)given[0], Msg.toBArray(given))).Length == 0)
{
}
if (ba.Length == 0)
{
return 0;
}
List<byte[]> bas = new List<byte[]> { ba };
Msg.BArraysToLists(resRefs, bas);
return 1;
}
/// <summary>
/// Issues a multicast query to the group representative for this client, then invokes a delegate specified by the caller
/// in which received replies are proved as vectors of the corresponding object types, one entry per reply received.
/// </summary>
/// <remarks>
/// P2PQueryInvoke issues a multicast query to the group representative for this client, then invokes a delegate specified by the caller
/// in which received replies are proved as vectors of the corresponding object types, one entry per reply received.
/// The first argument must be
/// the request identifier and the second argument an Vsync.Timeout() object specifying how long to wait for a reply
/// and what action to take if a timeout occurs. The delegate that will handle the replies is separated from the argument list for the request handler
/// by a special EOLMarker</remarks>
/// <param name="given">Variable-length list specifying request, timeout, parameters, then an EOLMarker, then vectors in which to store results.</param>
public void P2PQueryInvoke(params object[] given)
{
Timeout timeout;
Group.splitObs(null, out timeout, ref given);
this.CheckTypeSig(given);
Delegate del = (Delegate)given[given.Length - 1];
Vsync.ArrayResize(ref given, given.Length - 1);
byte[] ba = null;
Address rep = null;
int ntries = 0;
while (this.findRep(ref rep, ref ntries) && (ba = Vsync.VSYNCMEMBERS.doP2PQuery(rep, timeout, Vsync.CLIENTWRAPPED, this.gname, (int)given[0], Msg.toBArray(given))).Length == 0)
{
}
if (ba.Length == 0)
{
return;
}
Msg.InvokeFromBArray(ba, del);
}
// In this code we need to deal with the annoyance that NullReply causes us to get no replies, but so also would the failure of the
// the selected representative. So if findRep is called a second time or more times, it checks to see if the representative is
// the same one. If so, we return a NullReply to the user. If not -- if we were assigned a new representative -- we reissue the
// request. But we do this no more than twice.
internal bool findRep(ref Address rep, ref int ntries)
{
Address oldRep = rep;
using (var tmpLockObj = new LockAndElevate(this.myRepLock))
{
rep = this.myRep;
}
if (oldRep != null && rep != null && oldRep == rep)
{
// NullReply case
return false;
}
if (rep == null)
{
if (!this.refreshRep())
{
// Group isn't running
return false;
}
using (var tmpLockObj = new LockAndElevate(this.myRepLock))
{
rep = this.myRep;
}
if (oldRep != null && oldRep == rep)
{
// Not likely to occur, but call this another instance of the NullReply scenario
return false;
}
}
if (ntries++ == 2)
{
throw new VsyncAbortReplyException("This request is apparently causing group members to crash!");
}
FlowControl.FCBarrierCheck();
return true;
}
internal void CheckTypeSig(object[] obs)
{
int req = rcode(obs) + Vsync.SYSTEMREQS;
long mySig = this.ComputeTypeSig(obs);
if (this.tsigs == null || this.myRep == null)
{
throw new ArgumentException("Client intializer hasn't completed");
}
if (req > this.tsigs.Length || this.tsigs[req].Length == 0)
{
throw new ArgumentException("Group doesn't allow client calls to request code=" + rcode(obs));
}
bool fnd = false;
foreach (long ts in this.tsigs[req])
{
if (ts == mySig)
{
fnd = true;
break;
}
}
if (!fnd)
{
string ts = " ";
for (int i = 1; i < obs.Length; i++)
{
if (i > 1)
{
ts += ", ";
}
ts += obs[i].GetType().ToString();
}
throw new ArgumentException("Group doesn't have a handler for request code=" + req + " matching type signature (" + ts + " )");
}
}
internal long ComputeTypeSig(object[] obs)
{
string sig = string.Empty;
for (int i = 1; i < obs.Length; i++)
{
sig += obs[i].GetType() + ":";
}
return TSHash(sig);
}
private static long TSHash(string sig)
{
using (MemoryStream ms = new MemoryStream(Msg.StringToBytes(sig)))
{
using (HMAC hm = new HMACSHA256(new byte[] { 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11 }))
using (var tmpLockObj = new LockAndElevate(Msg.VerifyLock))
{
byte[] ba = hm.ComputeHash(ms);
long rval = 0;
for (int i = 0; i < ba.Length; i++)
{
rval ^= (((long)ba[i]) & 0xFF) << ((i & 3) << 3);
}
return rval;
}
}
}
private static int rcode(object[] obs)
{
int rval;
if (obs[0] is byte)
{
rval = (byte)obs[0];
}
else if (obs[0] is int)
{
rval = (int)obs[0];
}
else
{
throw new ArgumentException("Client-to-group request: can't identify the request code");
}
return rval;
}
}
/// <exclude></exclude>
public interface QKD
{
/// <exclude></exclude>
Address[] GetDestsToArray(Group dht, int vid);
/// <exclude></exclude>
List<Address> GetDests(Group dht);
/// <exclude></exclude>
byte[] ToBArray();
}
/// <summary>
/// Aggregation key for use when aggregating a DHT query
/// </summary>
#if PROTOCOL_BUFFERS
[ProtoContract(SkipConstructor = true)]
#else
[AutoMarshalled]
#endif
public class QueryKey<KT> : IComparable, IComparable<QueryKey<KT>>, IEquatable<QueryKey<KT>>, QKD
{
/// <exclude></exclude>
[ProtoMember(1)]
public readonly Address initiator;
/// <exclude></exclude>
[ProtoMember(2)]
public readonly int uid;
/// <exclude></exclude>
[ProtoMember(3)]
public List<KT> keys;
/// <exclude></exclude>
[ProtoMember(4)]
public readonly bool includeInitiator = true;
private static int nextid;
#if PROTOCOL_BUFFERS
[ProtoAfterDeserialization]
private void AfterDeserialize()
{
if (this.keys == null)
{
this.keys = new List<KT>(0);
}
}
#else
/// <exclude></exclude>
public QueryKey()
{
}
#endif
/// <summary>
/// Constructor takes a list of keys
/// </summary>
/// <param name="keys">the keys used in this query</param>
public QueryKey(IEnumerable<KT> keys)
: this(Vsync.my_address, keys, true)
{
}
internal QueryKey(Address initiator, IEnumerable<KT> keys)
: this(initiator, keys, true)
{
}
internal QueryKey(Address initiator, IEnumerable<KT> keys, bool includeInitiator)
{
// We need to copy it into a form the Vsync marshaller can represent...
this.keys = new List<KT>();
foreach (KT k in keys)
{
this.keys.Add(k);
}
if (this.keys.Count == 0)
{
throw new VsyncException("QueryKey: key-list can't be empty");
}
this.initiator = initiator;
this.includeInitiator = includeInitiator;
this.uid = ++nextid;
}
/// <exclude></exclude>
public byte[] ToBArray()
{
return Msg.toBArray(this.keys, this.initiator, this.uid, this.includeInitiator);
}
/// <exclude></exclude>
public QueryKey(List<KT> keys, Address initiator, int uid, bool includeInitiator)
{
this.keys = keys;
this.initiator = initiator;
this.uid = uid;
this.includeInitiator = includeInitiator;
}
private int oldVid = -1;
private List<Address> oldParticipants;
private Address[] oldParticipantsAsArray;
private readonly LockObject opLock = new LockObject("opLock");
/// <exclude></exclude>
public Address[] GetDestsToArray(Group dht, int vid)
{
while (true)
{
using (var tmpLockObj = new LockAndElevate(this.opLock))
{
if (vid == this.oldVid && this.oldParticipants != null)
{
return this.oldParticipantsAsArray ?? (this.oldParticipantsAsArray = this.oldParticipants.ToArray());
}
if (vid < this.oldVid)
{
return null;
}
}
return this.GetDests(dht).ToArray();
}
}
/// <exclude></exclude>
public List<Address> GetDests(Group dht)
{
View theView;
using (var tmpLockObj = new LockAndElevate(dht.ViewLock))
{
theView = dht.theView;
}
using (var tmpLockObj = new LockAndElevate(this.opLock))
{
if (theView.viewid == this.oldVid && this.oldParticipants != null)
{
return this.oldParticipants;
}
}
List<Address> dests = new List<Address>();
if (this.includeInitiator)
{
dests.Add(this.initiator);
}
int iAg = dht.GetAffinityGroup(theView, this.initiator);
int khash = this.initiator.GetHashCode();
foreach (KT key in this.keys)
{
khash ^= key.GetHashCode();
}
khash = Math.Abs(khash >> 3);
foreach (KT key in this.keys)
{
int kAg = dht.GetAffinityGroup(dht.DHTKeyHash(key));
if (kAg != iAg || !this.includeInitiator)
{
int offset = (khash % dht.myDHTBinSize) * dht.myDHTnShards;
if (offset + kAg >= theView.members.Length)
{
offset = 0;
}
if (offset + kAg < theView.members.Length)
{
Address dest = theView.members[offset + kAg];
if (!dests.Contains(dest))
{
dests.Add(dest);
}
}
else if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("WARNING: GetDests encountered a depopulated affinity group for key=" + key);
}
}
}
using (var tmpLockObj = new LockAndElevate(this.opLock))
{
this.oldVid = theView.viewid;
this.oldParticipants = dests;
this.oldParticipantsAsArray = null;
}
return dests;
}
/// <exclude></exclude>
public static bool operator <(QueryKey<KT> first, QueryKey<KT> second)
{
return Compare(first, second) < 0;
}
/// <exclude></exclude>
public static bool operator >(QueryKey<KT> first, QueryKey<KT> second)
{
return Compare(first, second) > 0;
}
/// <exclude></exclude>
public static bool operator <=(QueryKey<KT> first, QueryKey<KT> second)
{
return Compare(first, second) <= 0;
}
/// <exclude></exclude>
public static bool operator >=(QueryKey<KT> first, QueryKey<KT> second)
{
return Compare(first, second) >= 0;
}
/// <exclude></exclude>
public static bool operator ==(QueryKey<KT> first, QueryKey<KT> second)
{
return Compare(first, second) == 0;
}
/// <exclude></exclude>
public static bool operator !=(QueryKey<KT> first, QueryKey<KT> second)
{
return Compare(first, second) != 0;
}
/// <exclude></exclude>
public static int Compare(QueryKey<KT> first, QueryKey<KT> second)
{
if (object.ReferenceEquals(first, second))
{
return 0;
}
if (object.ReferenceEquals(first, null))
{
return -1;
}
if (object.ReferenceEquals(second, null))
{
return 1;
}
int comparison = first.uid.CompareTo(second.uid);
if (comparison != 0)
{
return comparison;
}
comparison = first.initiator.CompareTo(second.initiator);
return comparison;
}
/// <exclude></exclude>
public int CompareTo(object other)
{
return Compare(this, other as QueryKey<KT>);
}
/// <exclude></exclude>
public int CompareTo(QueryKey<KT> other)
{
return Compare(this, other);
}
/// <exclude></exclude>
public override bool Equals(object other)
{
return Compare(this, other as QueryKey<KT>) == 0;
}
/// <exclude></exclude>
public bool Equals(QueryKey<KT> other)
{
return Compare(this, other) == 0;
}
/// <exclude></exclude>
public override int GetHashCode()
{
int hc = this.initiator.GetHashCode() + (this.uid * 77077);
foreach (KT key in this.keys)
{
hc ^= key.GetHashCode();
}
return hc;
}
/// <exclude></exclude>
public override string ToString()
{
string ks = " ";
foreach (KT k in this.keys)
{
ks += "<" + k + "> ";
}
return "QueryKey{" + ks + ", initator=" + this.initiator + (this.includeInitiator ? "(included)" : "(not included)") + (this.oldVid == -1 ? " (haven't yet computed participants)" : ".... participants for vid=" + this.oldVid + " are " + (this.oldParticipants == null ? "<<unknown>>" : Address.VectorToString(this.oldParticipantsAsArray ?? this.oldParticipants.ToArray()) + "}"));
}
}
/// <summary>
/// The class corresponding to Vsync process groups, which is a central feature of the Vsync <it>virtual synchrony model</it>.
/// </summary>
public class Group : IComparable, IComparable<Group>, IEquatable<Group>, IDisposable
{
internal static Dictionary<Address, Group> VsyncGroups = new Dictionary<Address, Group>(100);
// Groups this node either belongs to, or is joining (theView is null while joining!)
internal static LockObject VsyncGroupsLock = new LockObject("VsyncGroupsLock", ThreadPriority.Highest);
internal static Dictionary<Address, Group> TPGroups = new Dictionary<Address, Group>(100);
// Tracking Proxy groups. A kind of shadow used by the ORACLE to track membership info
internal static LockObject TPGroupsLock = new LockObject("TPGroupsLock", ThreadPriority.Highest);
internal static List<Address> GroupRIPList = new List<Address>();
internal static LockObject GroupRIPLock = new LockObject("GroupRIPLock");
// Tracked by all Vsync nodes, for all groups
internal LockObject groupLock; // Use to lock/unlock entire group on entry from user
internal LockObject ViewLock = new LockObject("ViewLock", ThreadPriority.Highest); // Protects g.theView
internal LockObject TokenLock = new LockObject("TokenLock", ThreadPriority.Highest);
// Protects g.theToken, in large groups
internal volatile bool interesting = true; // Send a token asap
internal LockObject GroupFlagsLock = new LockObject("GroupFlagsLock"); // Touched often; hold very briefly!
internal LockObject SIFLock = new LockObject("SIFLock", ThreadPriority.Highest);
// Held while sending a message in fragments
internal LockObject SIFListLock = new LockObject("SIFListLock", ThreadPriority.Highest);
internal List<object[]> SIFList = null; // Non-null while fragmenting a big message
internal Semaphore CPSSema; // If waiting on a flush
internal volatile View theView; // Current
internal View nextView; // Non-null during the state transfer logic
internal Initializer theInitializer = null;
internal volatile bool initializationDone = false;
internal string gname;
internal string where;
internal Address gaddr;
// Fields active only when groupClosed is false
internal Thread groupReaderThread;
internal Thread groupP2PReaderThread;
internal Thread groupIPMCReaderThread;
internal volatile bool GroupOpen = false; // Becomes false once group.leave()/group.Terminate() is called
internal bool TermSent = false;
internal bool WasOpen = false; // True if the group has ever been open
internal bool isTrackingProxy = false;
internal bool isClientProxy = false;
internal bool HasFirstView = false; // Becomes true when I get my very first view
internal bool CallbacksDone = false; // Becomes true after the NEWVIEW callbacks (if any) for the first view
internal int myVirtIPAddr = MCMDSocket.UNKNOWN; // IP address for this group as a long int
internal bool hasPhysMapping = false; // Used only for tracking proxies: group has an assigned IPMC address
internal int myPhysIPAddr = MCMDSocket.UNKNOWN;
// Again, used in tracking proxy to represent the assigned IPMC address
internal MCMDSocket my_socket;
internal volatile int nextMsgid = 0; // Sequential for group multicasts on a per-group basis
internal volatile int nRaw = 0; // Counts raw messages sent in a row
internal int lastLgAckID = -1;
internal int rcvdMcastsCnt = 0; // Incremented on received multicasts, zeroed every five minutes
internal int rcvdMcastsRate = 0; // Running average: newAvg = (newRate + oldAvg)/2;
internal bool getsMCADDR;
// True in a tracking proxy at the MCMD deamon if we're allocating a true IPMC address to this group
internal Group sameAs;
// If non-null in a tracking proxy at the MCMD deamon, tells us to use the same IPMC address as the sameAs group
internal bool LeaderMode = false; // Becomes true once this node is the active leader for the ORACLE
internal bool TakingOver = false;
// True when the old leader fails and I'm in the process of taking over from him
internal int myFirstLeadershipView; // ViewID when I first took over as large group owner (else 0)
internal int safeSendThreshold = ALL;
// The number of nodes that must ack a SafeSend before it will be delivered
internal durabilityMethod safeSendDurabilityMethod;
// The method to use for making a SafeSend durable. If none is specified, in-memory caching is used
internal List<KeyValuePair<Thread, Msg>> curMsgList = new List<KeyValuePair<Thread, Msg>>();
internal LockObject curMsgListLock = new LockObject("curMsgListLock", ThreadPriority.Highest);
internal List<Msg> Unstable = new List<Msg>();
// Lists messages that aren't yet stable because not yet fully acknowledged
internal volatile int UnstableCount;
internal LockObject UnstableLock = new LockObject("UnstableLock"); // Protects Unstable list
internal long RequestedMinStableAt;
// Vsync.NOW the last time this member sent a P2P inquiry requesting minStable from a sender
internal long SentStableAt; // Vsync.NOW the last time a stability message of some sort was sent
internal long LastSendAt; // Vsync.NOW when the last multicast to this group occured
internal int CurrentBacklog; // Largest value of LocalBacklog sent since last STABILITYINFO multicast
internal volatile int PreviousBacklog; // Value of CurrentBacklog when last sent
internal BoundedBuffer incomingSends;
internal BoundedBuffer incomingP2P;
internal Semaphore Wedged = new Semaphore(1, int.MaxValue); // Used to wedge group during membership changes
internal ILFunc myLoggingFcn; // If logged, routine to call back to on events of interest
internal long myLoggingId;
internal const int IL_SEND = 0;
internal const int IL_SAFESEND = 1;
internal const int IL_ORDEREDSEND = 2;
internal const int IL_QUERY = 3;
internal const int IL_DELIVERY = 4;
internal const int IL_AGGWAIT = 5;
internal const int IL_AGGDVALUE = 6;
internal const int IL_AGGLVALUE = 7;
internal const int IL_AGGREGATE = 8;
internal const int IL_START = 0;
internal const int IL_DONE = 1;
internal volatile int flags; // Flags associated with this group
internal LockObject quiesceLock = new LockObject("quiesceLock");
internal int quiesceCnt;
internal Semaphore quiesceWait = new Semaphore(0, int.MaxValue);
// These flags come in two flavors. Some are "set once", when the group is first created and for those we don't need to worry about locking before testing.
// But the others are turned on and off and on them, you must obtain the right kind of lock before modifying or reading.
internal const int G_ISLARGE = 0x01; // [set once, no lock needed] Group is large
internal const int G_USEUNICAST = 0x02; // [set once, no lock needed] Unicast only
internal const int G_USEIPMC = 0x04; // [set once, no lock needed] Use IPMC if possible
internal const int G_HASUAGG = 0x08;
// [set once, no lock needed] Contains one or more user defined aggregators, run tokens faster
internal const int G_TRACE = 0x10; // [set once, no lock needed] Trace events in this group
internal const int G_SECURE = 0x20; // [set once, no lock needed] True for a secure group
internal const int G_ISRAW = 0x40; // [set once, no lock needed] Disables vsync flush
internal const int G_TERMINATING = 0x80; // [needs quiesceLock] Preparing to close/terminate this group
internal const int G_WEDGED = 0x100; // [needs GroupFlagsLock] True while group is wedged for membership changes
internal const int G_SENDINGSTABILITY = 0x0200;
// [needs GroupFlagsLock] True if the group is in the process of sending its stability report
internal const int G_NEEDSTATEXFER = 0x400;
// [needs GroupFlagsLock] Wait until state xfer completes before delivering first message
internal const int G_USESOOB = 0x800; // Uses OOB subsystem
internal const int G_GETTINGMINSTABLE = 0x1000; // Checking minstable
internal static string[] flagNames = { "islarge", "useunicast", "useIPMC", "hasuagg", "trace", "secure", "raw", "terminating", "wedged", "sendingstability", "needstatexfer", "usesOOB", "getting-minstable" };
internal ManualResetEvent interruptLockWaits = new ManualResetEvent(false);
internal Vsync.PendingLeaderOps NotifyDALOnReply = null; // Non-null if the "do as leader" code is running
internal int uids; // Unique ids used in connection with the DAL logic
internal List<Msg> ToDo = new List<Msg>(); // Received and already acked, but had a future view-id
internal volatile int ToDoCount;
internal LockObject ToDoLock = new LockObject("ToDoLock");
internal List<byte[]> IPMCArrivedEarly = new List<byte[]>();
// Received and not acked; arrived during the first stages of group join
internal List<ReliableSender.MReplayMe> MsgArrivedEarly = new List<ReliableSender.MReplayMe>();
internal List<Msg> OutOfOrderQueue = new List<Msg>();
internal volatile int OutOfOrderQueueCount;
internal LockObject OutOfOrderQueueLock = new LockObject("outOfOrderQueueLock");
internal List<IPMCVinfo> stashedIPMCviews = new List<IPMCVinfo>();
internal LinkedList<object>[] AggList;
// For small-group aggregation: for each level, a list of the associated aggregations
internal LockObject AggListLock = new LockObject("AggListLock");
internal class svi
{
internal Address sender;
internal int vid;
internal int msgid;
internal svi(Address s, int v, int m)
{
this.sender = s;
this.vid = v;
this.msgid = m;
}
}
internal List<svi> desiredOrderQueue = new List<svi>();
internal volatile tokenInfo theToken; // My representation of this group's token
internal int gcollectedTo = -1; // Last value of aggstable used in garbage collecting lgPendingSendBuffer
internal long TypeSig;
internal string TypeSigStr = "<undef>";
internal bool joinFailed = false; // Relays info about join failure
internal string reason = string.Empty; // If join fails, this will give the reason
internal string myCheckpointFile; // If non-null, the file name into which checkpoints should be saved
internal int CheckpointFrequency = -1;
// If greater than 0, an interval, in ms, at which checkpoints should be made
internal FileStream myChkptStream = null;
// While checkpointing, the temporary file into which the checkpoint is being written
internal byte[] myAESkey = null; // A secret key for this group, if the group uses AES security
internal bool userSpecifiedKey = false; // True if the end-user specified the key
internal Aes myAes = null;
internal LockObject myAesLock = new LockObject("myAESLock");
internal ICryptoTransform myDecryptor = null;
/// <exclude>
/// <summary>
/// Internal Aggregation structure
/// </summary>
/// </exclude>
public class AggInfo : IComparable, IComparable<AggInfo>, IEquatable<AggInfo>
{
internal readonly Type KT;
internal readonly Type VT;
internal readonly string KVT;
// This is tricky. The following is actually a reference to an object of type
// Aggregator<KeyType,ValueType> that implements the IAggregateEventHandler interface
internal readonly ConstructorInfo myFactory;
// to pass into the factory
internal readonly object theGroup;
internal readonly object theDel;
// And the "last resort" action to take...
internal readonly Timeout theTimeout;
internal AggInfo(Type kt, Type vt, ConstructorInfo mc, object g, object d, Timeout theTO)
{
this.KT = kt;
this.VT = vt;
this.KVT = kt + ":" + vt;
this.myFactory = mc;
this.theGroup = g;
this.theDel = d;
this.theTimeout = theTO;
}
/// <exclude>
/// <summary>
/// Required public constructor
/// </summary>
/// </exclude>
public AggInfo()
{
}
/// <exclude></exclude>
public static bool operator <(AggInfo first, AggInfo second)
{
return Compare(first, second) < 0;
}
/// <exclude></exclude>
public static bool operator >(AggInfo first, AggInfo second)
{
return Compare(first, second) > 0;
}
/// <exclude></exclude>
public static bool operator <=(AggInfo first, AggInfo second)
{
return Compare(first, second) <= 0;
}
/// <exclude></exclude>
public static bool operator >=(AggInfo first, AggInfo second)
{
return Compare(first, second) >= 0;
}
/// <exclude></exclude>
public static bool operator ==(AggInfo first, AggInfo second)
{
return Equals(first, second);
}
/// <exclude></exclude>
public static bool operator !=(AggInfo first, AggInfo second)
{
return !Equals(first, second);
}
/// <exclude></exclude>
public static int Compare(AggInfo first, AggInfo second)
{
if (object.ReferenceEquals(first, second))
{
return 0;
}
if (object.ReferenceEquals(first, null))
{
return -1;
}
if (object.ReferenceEquals(second, null))
{
return 1;
}
return string.CompareOrdinal(first.KVT, second.KVT);
}
/// <exclude></exclude>
public static bool Equals(AggInfo first, AggInfo second)
{
if (object.ReferenceEquals(first, second))
{
return true;
}
if (object.ReferenceEquals(first, null) || object.ReferenceEquals(second, null))
{
return false;
}
return first.VT == second.VT && first.KT == second.KT;
}
/// <exclude>
/// <summary>
/// Required equality test
/// </summary>
/// </exclude>
public override bool Equals(object other)
{
return Equals(this, other as AggInfo);
}
/// <exclude>
/// <summary>
/// Required equality test
/// </summary>
/// </exclude>
public bool Equals(AggInfo other)
{
return Equals(this, other);
}
/// <exclude>
/// <summary>
/// Required comparator
/// </summary>
/// </exclude>
public int CompareTo(object other)
{
return Compare(this, other as AggInfo);
}
/// <exclude>
/// <summary>
/// Required comparator
/// </summary>
/// </exclude>
public int CompareTo(AggInfo other)
{
return Compare(this, other);
}
/// <exclude>
/// <summary>
/// Required hash code computation
/// </summary>
/// </exclude>
public override int GetHashCode()
{
return (this.VT.GetHashCode() * 37) + this.KT.GetHashCode();
}
}
internal List<AggInfo> AggTypes = new List<AggInfo>(); // Aggregator types registered in this group
internal Address[] Dying = new Address[0]; // Leaving the Group
internal Address[] Joining = new Address[0]; // Joining the Group
/// <summary>
/// ALL members must reply
/// </summary>
public const int ALL = -1;
/// <summary>
/// At least a majority must reply
/// </summary>
public const int MAJORITY = -2;
/// <summary>
/// End of the list of arguments, start of the list of reply variables
/// </summary>
public static EOLMarker EOL = new EOLMarker();
// Used as a marker between arguments and reply variables list in Query
/// <exclude></exclude>
public static int Compare(Group first, Group second)
{
if (object.ReferenceEquals(first, second))
{
return 0;
}
if (object.ReferenceEquals(first, null))
{
return -1;
}
if (object.ReferenceEquals(second, null))
{
return 1;
}
return first.gaddr.CompareTo(second.gaddr);
}
/// <exclude></exclude>
public static bool operator <(Group first, Group second)
{
return Compare(first, second) < 0;
}
/// <exclude></exclude>
public static bool operator >(Group first, Group second)
{
return Compare(first, second) > 0;
}
/// <exclude></exclude>
public static bool operator <=(Group first, Group second)
{
return Compare(first, second) <= 0;
}
/// <exclude></exclude>
public static bool operator >=(Group first, Group second)
{
return Compare(first, second) >= 0;
}
/// <exclude></exclude>
public static bool operator ==(Group first, Group second)
{
return Compare(first, second) == 0;
}
/// <exclude></exclude>
public static bool operator !=(Group first, Group second)
{
return Compare(first, second) != 0;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public int CompareTo(object other)
{
return Compare(this, other as Group);
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public int CompareTo(Group other)
{
return Compare(this, other);
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public override bool Equals(object other)
{
return Compare(this, other as Group) == 0;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public bool Equals(Group other)
{
return Compare(this, other) == 0;
}
/// <exclude>
/// <summary>
/// Callback class, internal
/// </summary>
/// </exclude>
public class CallBack
{
internal bool withLock;
internal Callable cbProc;
/// <exclude>
/// <summary>
/// Callback constructor, internal
/// </summary>
/// </exclude>
public CallBack(bool wl, Delegate d)
{
this.withLock = wl;
this.cbProc = new Callable(d);
}
}
internal class VHCallBack
{
internal bool withLock;
internal ViewHandler vhProc;
// Has ability to obtain the LLENTRY lock on gaddr before calling back but currently Vsync isn't
// using that feature because it was provoking deadlocks. But must worry now about vsync violations
internal VHCallBack(bool wl, ViewHandler vh)
{
this.withLock = wl;
this.vhProc = vh;
}
}
internal class UHCallBack
{
internal bool withLock;
internal UCallback uhProc;
// Has ability to obtain the LLENTRY lock on gaddr before calling back but currently Vsync isn't
// using that feature because it was provoking deadlocks. But must worry now about vsync violations
internal UHCallBack(bool wl, UCallback vh)
{
this.withLock = wl;
this.uhProc = vh;
}
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public class myHandlers
{
internal List<CallBack> hList = new List<CallBack>();
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myHandlers operator +(myHandlers a, Delegate b)
{
if (a == null)
{
a = new myHandlers();
}
a.hList.Add(new CallBack(false, b));
return a;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules. This is for the doRegister case
/// </summary>
/// </exclude>
public static myHandlers operator +(myHandlers a, CallBack b)
{
if (a == null)
{
a = new myHandlers();
}
a.hList.Add(b);
return a;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules. This is for the doRegister case
/// </summary>
/// </exclude>
public static myHandlers operator -(myHandlers a, object b)
{
throw new VsyncException("Error: attempt to unregister a handler (instead, Dispose of the group, then create a new handle)");
}
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public class myWatches
{
internal List<Watcher> hList = new List<Watcher>();
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myWatches operator +(myWatches a, Watcher b)
{
if (a == null)
{
a = new myWatches();
}
a.hList.Add(b);
return a;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myWatches operator -(myWatches a, Watcher b)
{
if (a == null)
{
a = new myWatches();
}
a.hList.Remove(b);
return a;
}
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public class myVHandlers
{
internal List<VHCallBack> vhList = new List<VHCallBack>();
internal LockObject vhListLock = new LockObject("vhListLock");
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myVHandlers operator +(myVHandlers a, ViewHandler b)
{
using (var tmpLockObj = new LockAndElevate(a.vhListLock))
{
a.vhList.Add(new VHCallBack(false, b));
}
return a;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myVHandlers operator -(myVHandlers a, ViewHandler b)
{
using (var tmpLockObj = new LockAndElevate(a.vhListLock))
{
foreach (VHCallBack v in a.vhList)
{
if (v.vhProc.Equals(b))
{
a.vhList.Remove(v);
return a;
}
}
}
return a;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// <remarks>Puts handler at the front of the queue. For internal use only!</remarks>
/// </exclude>
public static myVHandlers operator *(myVHandlers a, ViewHandler b)
{
List<VHCallBack> newList = new List<VHCallBack> { new VHCallBack(false, b) };
foreach (VHCallBack vhcb in a.vhList)
{
newList.Add(vhcb);
}
using (var tmpLockObj = new LockAndElevate(a.vhListLock))
{
a.vhList = newList;
}
return a;
}
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public class myUMHandlers
{
internal bool locked;
internal List<UHCallBack> uhList = new List<UHCallBack>();
internal LockObject uhListLock = new LockObject("uhListLock");
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myUMHandlers operator +(myUMHandlers a, UCallback b)
{
if (a.locked)
{
throw new VsyncException("Illegal attempt to register a handler after calling Group.Join()");
}
using (var tmpLockObj = new LockAndElevate(a.uhListLock))
{
a.uhList.Add(new UHCallBack(false, b));
}
return a;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myUMHandlers operator -(myUMHandlers a, UCallback b)
{
throw new VsyncException("Error: attempt to unregister a handler (instead, Dispose of the group, then create a new handle)");
}
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public class myUP2PHandlers
{
internal bool locked;
internal List<UHCallBack> uhList = new List<UHCallBack>();
internal LockObject uhListLock = new LockObject("uhListLock");
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myUP2PHandlers operator +(myUP2PHandlers a, UCallback b)
{
if (a.locked)
{
throw new VsyncException("Illegal attempt to register a handler after calling Group.Join()");
}
using (var tmpLockObj = new LockAndElevate(a.uhListLock))
{
a.uhList.Add(new UHCallBack(false, b));
}
return a;
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public static myUP2PHandlers operator -(myUP2PHandlers a, UCallback b)
{
throw new VsyncException("Error: attempt to unregister a handler (instead, Dispose of the group, then create a new handle)");
}
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public class mimicVector
{
internal bool locked;
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public myHandlers[] ListofhLists = new myHandlers[0];
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// <param name="i"></param>
/// <returns></returns>
/// </exclude>
public myHandlers this[int i]
{
get
{
i += Vsync.SYSTEMREQS;
if (i >= this.ListofhLists.Length)
{
Vsync.ArrayResize(ref this.ListofhLists, i + 1);
}
return this.ListofhLists[i];
}
set
{
if (this.locked)
{
throw new VsyncException("Illegal attempt to register a handler after calling Group.Join()");
}
i += Vsync.SYSTEMREQS;
this.ListofhLists[i] = value;
}
}
}
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public class mimicWVector
{
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// </exclude>
public Dictionary<Address, myWatches> ListofwLists = new Dictionary<Address, myWatches>(8);
private readonly LockObject ListofwListsLock = new LockObject("ListofwListsLock");
/// <exclude>
/// <summary>
/// Internal to Vsync; public to comply with C# scope rules
/// </summary>
/// <param name="who"></param>
/// <returns></returns>
/// </exclude>
public myWatches this[Address who]
{
get
{
using (var tmpLockObj = new LockAndElevate(this.ListofwListsLock))
{
myWatches mw;
if (this.ListofwLists.TryGetValue(who, out mw))
{
return mw;
}
this.ListofwLists.Add(who, new myWatches());
return this.ListofwLists[who];
}
}
set
{
using (var tmpLockObj = new LockAndElevate(this.ListofwListsLock))
{
if (this.ListofwLists.ContainsKey(who))
{
this.ListofwLists[who] = value;
}
else
{
this.ListofwLists.Add(who, value);
}
}
}
}
}
/// <summary>
/// A vector of handlers for incoming requests and updates, to which users attach additional handlers for requests defined by their code.
/// </summary>
/// <remarks>
/// To register a request handler, one can call the explicit RegisterHandler method, but it is also possible (and
/// more elegant) to write code like this: Handlers[request-id] += (myHandler)delegate(... args ...) { ..... code to handle requests with matching type signature .... };
/// Your code will be invoked later, each time a new message to the group is received and matches the request-id and type signature of your handler.
/// </remarks>
public mimicVector Handlers = new mimicVector();
/// <summary>
/// A vector of handlers for watched processes.
/// </summary>
/// <remarks>
/// To register a watch on a group member, write code like this: <c>Watch[who] += (Vsync.Watcher)delegate(int event) { };</c>
/// If you may need to cancel the watch, create a method and register it as follows: <c>Watch[who] += myWatcherMethod; </c>
/// In this case, you can use <c>Watch[who] -= myWatcherMethod; </c> to cancel the watch.
/// The event indicates whether the watched process joined the group (W_JOIN) or left the group (W_LEAVE).
/// </remarks>
///
public mimicWVector Watch = new mimicWVector();
internal SortedList<long, Msg>[] PendingQueue;
internal volatile int PendingQueueCount;
internal LockObject PendingQueueLock = new LockObject("PendingQueueLock", ThreadPriority.Highest);
internal ChkptChoser theChkptChoser;
internal List<ChkptMaker> theChkptMakers = new List<ChkptMaker>();
internal LockObject theChkptMakersLock = new LockObject("ChkptMakers");
internal volatile bool inhibitEOC = false;
/// <summary>
/// New view callback handlers.
/// </summary>
/// <remarks>
/// To register a ViewHandler, one can call the explicit RegisterViewCB method, but it is also possible (and
/// more elegant) to write code like this: ViewHandlers += (VHandler)delegate(View v) { ..... code to handle new views .... };
/// </remarks>
public myVHandlers ViewHandlers = new myVHandlers();
// The FlushHandler callback is for internal use only; an upcall occurs during PROPOSE but the code needs to be live
// or group view updates will wedge. Thus I don't consider this to be a safe call to expose to Vsync users
internal myVHandlers FlushHandlers = new myVHandlers();
internal LockObject ViewHandlersLock = new LockObject("ViewHandlersLock");
internal bool hasUniversalHandlers;
internal bool isP2PThread;
internal myUMHandlers UniversalMHandlers = new myUMHandlers();
internal myUP2PHandlers UniversalP2PHandlers = new myUP2PHandlers();
internal Group(string gname, Address gaddr, View v)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncException("Must call VsyncSystem.Start() first");
}
this.MakeChkpt = new ChkptMkr(this);
this.LoadChkpt = new ChkptLdr(this);
this.Initializer = new Initer(this);
this.gname = gname ?? "<unknown>";
this.gaddr = gaddr;
using (var tmpLockObj = new LockAndElevate(Group.GroupRIPLock))
{
if (Group.GroupRIPList.Contains(gaddr))
{
throw new VsyncException("Must not rejoin/recreate a group immediately after leave/terminate");
}
}
this.AddToGroupsList(gname, gaddr);
this.Setup();
this.NewView(v, "VsyncGroup create", null);
}
private void AddToGroupsList(string gname, Address gaddr)
{
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
Group g = kvp.Value;
if (g.gname.Equals(gname, StringComparison.Ordinal) || g.gaddr == gaddr || g == this)
{
throw new VsyncException("AddToGroupsList(" + gname + "/" + gaddr + "): but this group name or gaddr already exists (" + g.gname + "/" + g.gaddr + ")");
}
}
// These need to be big enough to deal with arriving fragmented messages, and totally ordered multicasts, which can both create big bursty loads
this.incomingSends = new BoundedBuffer(gname + ":DeliverInOrder(IPMC)", 5120, ILock.LLDELIVERY, -1, -1);
this.incomingP2P = new BoundedBuffer(gname + ":DeliverInOrder(P2P)", 1024, ILock.LLDELIVERY, -1, -1);
if (VsyncGroups.ContainsKey(gaddr))
{
throw new VsyncException("AddToGroupsList: already listed");
}
VsyncGroups.Add(gaddr, this);
}
}
/// <summary>
/// The most standard way of constructing a new Group object: the user specifies the name and let's Vsync fill in other data.
/// Once the Group handle is obtained, it is important to register handlers, data types, aggregators, etc, prior to calling Join or Create.
/// </summary>
/// <param name="name">Group name</param>
public Group(string name)
{
if (!name.Equals("ORACLE", StringComparison.Ordinal) && !name.Equals("VSYNCMEMBERS", StringComparison.Ordinal) && (!VsyncSystem.VsyncActive || VsyncSystem.VsyncRestarting))
{
throw new VsyncException("Group Join/Create operation was requested but Vsync wasn't running yet");
}
this.MakeChkpt = new ChkptMkr(this);
this.LoadChkpt = new ChkptLdr(this);
this.Initializer = new Initer(this);
this.groupLock = new LockObject("<" + name + ">.GroupLock", ThreadPriority.Highest);
this.Bind(name);
// Group is uniquely named by hashing its name to a virtual IPMC address, but won't use IPMC unless
// we have permission, and even then, only if Dr. Multicast (the MCMD) assigns a physical IP address
// to this group (or to some set of groups that includes this one)
long addr = (Vsync.CLASSD + Vsync.VSYNC_MCRANGE_LOW + Address.GroupNameHash(this.gname)) & 0xFFFFFFFFL;
this.gaddr = new Address(Vsync.LastIPv4(MCMDSocket.PMCAddr((int)addr)), 0);
using (var tmpLockObj = new LockAndElevate(Group.GroupRIPLock))
{
if (Group.GroupRIPList.Contains(this.gaddr))
{
throw new VsyncException("Must not rejoin/recreate a group immediately after leave/terminate");
}
}
this.AddToGroupsList(this.gname, this.gaddr);
this.Setup();
}
private Group()
{
this.MakeChkpt = new ChkptMkr(this);
this.LoadChkpt = new ChkptLdr(this);
this.Initializer = new Initer(this);
this.groupLock = new LockObject("<anonymous group>.Lock");
}
internal volatile bool[] allowsClientRequests = new bool[512];
/// <summary>
/// Enables a handler to accept P2PQuery requests from group clients
/// </summary>
/// <param name="request">The request code</param>
public void AllowClientRequests(int request)
{
request += Vsync.SYSTEMREQS;
if (request >= 0 && request <= 511)
{
this.allowsClientRequests[request] = true;
}
else
{
throw new VsyncException("illegal request code");
}
}
/// <summary>
/// Redirects the role of representating this client to the specified new representative. Pending requests will be reissued.
/// </summary>
/// <param name="Client">A client of the group currently bound to the caller</param>
/// <param name="newRepresentative">A group member that will handle this client's requests in the future</param>
public void RedirectClient(Address Client, Address newRepresentative)
{
if (this.GetRankOf(newRepresentative) == -1 || this.GetRankOf(Client) != -1 || Vsync.VSYNCMEMBERS.GetRankOf(Client) == -1)
{
throw new ArgumentException("Illegal argument to Vsync.RedirectClient");
}
Vsync.VSYNCMEMBERS.doP2PSend(Client, true, Vsync.BECLIENT, this.gname, newRepresentative);
}
/// <exclude>
/// <summary>
/// Internal, pretty-prints information about tracking proxies in ORACLE members
/// </summary>
/// <param name="tpg">Tracking proxy</param>
/// <returns>Group state as a string</returns>
/// </exclude>
public static string TPtoString(Group tpg)
{
string s = ((tpg.flags & G_ISLARGE) != 0 ? "Large: " : string.Empty) + "<" + tpg.gname + "/" + tpg.gaddr + "/" + tpg.where + "> " + "[VIP: " + MCMDSocket.PMCAddr(tpg.myVirtIPAddr) + ", PIP: " + MCMDSocket.PMCAddr(tpg.myPhysIPAddr) + "] ";
if (tpg.theView != null)
{
return s + " vid " + tpg.theView.viewid + ": next mid=" + tpg.theView.viewid + ":" + tpg.nextMsgid + ", rate " + tpg.rcvdMcastsRate + ", {" + Address.VectorToString(tpg.theView.members) + "}";
}
return s + " view unknown";
}
// Called only in the ORACLE, to create a new Tracking Proxy
internal static Group TrackingProxy(string gname, string where, Address gaddr, long tsig, int[] mm, View v, int flags, bool MapAddr)
{
Group g = new Group { isTrackingProxy = true, flags = flags & (G_ISLARGE | G_USEIPMC | G_USEUNICAST), where = where, gaddr = gaddr, gname = gname, TypeSig = tsig };
if (mm != null && mm[MCMDSocket.VIRTUAL] != MCMDSocket.UNKNOWN)
{
g.myVirtIPAddr = mm[MCMDSocket.VIRTUAL];
g.myPhysIPAddr = mm[MCMDSocket.PHYSICAL];
}
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
g.theView = v;
}
g.GroupOpen = g.WasOpen = true;
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
if (TPGroups.ContainsKey(g.gaddr))
{
Group oldGroup = TPGroups[g.gaddr];
if (oldGroup.theView == null || (v != null && v.viewid > oldGroup.theView.viewid))
{
using (var tmpLockObj1 = new LockAndElevate(oldGroup.ViewLock))
{
oldGroup.theView = v;
}
}
g = oldGroup;
}
else
{
TPGroups.Add(g.gaddr, g);
}
}
if (MapAddr)
{
MCMDSocket.SetMap("create TrackingProxy", gname, true, MCMDSocket.GetMap(g.gaddr, false));
}
ReliableSender.StartGroupReader(g);
return g;
}
internal static Group TrackingProxyLookup(string gname)
{
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in TPGroups)
{
if (kvp.Value.gname.Equals(gname, StringComparison.Ordinal))
{
return kvp.Value;
}
}
}
return null;
}
internal static Group TrackingProxyLookup(Address gaddr)
{
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
Group g;
if (TPGroups.TryGetValue(gaddr, out g))
{
return g;
}
}
return null;
}
internal static void TrackingProxyDelete(Address gaddr)
{
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
TPGroups.Remove(gaddr);
}
}
internal void TPGroupsLearnedMM(int[] mm)
{
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in TPGroups)
{
if (kvp.Value.myVirtIPAddr == mm[MCMDSocket.VIRTUAL])
{
kvp.Value.myPhysIPAddr = mm[MCMDSocket.PHYSICAL];
return;
}
}
}
}
internal int[] MCMDMAP()
{
return new[] { this.myVirtIPAddr, this.myPhysIPAddr };
}
internal static int[] MCMDMAP(Address gaddr)
{
Group tpg = TrackingProxyLookup(gaddr);
if (tpg != null)
{
return tpg.MCMDMAP();
}
return new[] { MCMDSocket.UNKNOWN, MCMDSocket.UNKNOWN };
}
internal static List<Group> VsyncGroupsClone()
{
List<Group> theClone = new List<Group>();
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
theClone.Add(kvp.Value);
}
}
return theClone;
}
internal static List<Group> VsyncAllGroupsClone(bool removeDups)
{
List<Group> theClone = new List<Group>();
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in TPGroups)
{
theClone.Add(kvp.Value);
}
}
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
Group ig = kvp.Value;
bool fnd = false;
if (removeDups)
{
foreach (Group cg in theClone)
{
if (ig.gaddr == cg.gaddr)
{
fnd = true;
break;
}
}
}
if (!fnd)
{
theClone.Add(ig);
}
}
}
return theClone;
}
private void CheckState()
{
if (this.theView != null)
{
throw new VsyncException("Can't change attributes on an open group");
}
if ((this.flags & (G_USEIPMC | G_USEUNICAST)) != 0)
{
throw new VsyncException("Can't request both useUnicast and useIPMC on same group");
}
}
/// <summary>
/// Specifies that the user wishes this group to use the large group protocols.
/// </summary>
/// <remarks>
/// Specifies that the user wishes this group to use the large group protocols. Recommended if a group may have
/// more than about 10 members, although in fact the normal protocols should be stable up to at least 100 members.
/// </remarks>
public void SetLarge()
{
if (Vsync.VSYNC_UNICAST_ONLY)
{
Vsync.WriteLine("WARNING: g.SetLarge is not supported with VSYNC_UNICAST_ONLY for this version of Vsync; ignored.");
return;
}
this.CheckState();
this.flags |= G_ISLARGE;
this.LGSetup();
}
/// <summary>
/// Group that will only be used with RawSend and other Raw IPC primitives. No need for the "flush" virtual synchrony step
/// </summary>
public void SetRaw()
{
this.flags |= G_ISRAW;
}
/// <summary>
/// Specifies a logging function for this group
/// </summary>
/// <param name="lfunc">The function</param>
public void SetLogged(ILFunc lfunc)
{
this.myLoggingFcn = lfunc;
}
/// <summary>
/// Enable or disable a per-group trace of events, which prints to the console and also to the Vsync log file
/// </summary>
/// <param name="onOff">True to enable trace, false to disable trace</param>
public void Trace(bool onOff)
{
if (onOff)
{
this.flags |= G_TRACE;
}
else
{
this.flags &= G_TRACE;
}
}
/// <summary>
/// Specifies that the user wishes this group to use only unicast (point to point via UDP) messaging.
/// </summary>
/// <remarks>
/// Removes this group from the ones considered for the Dr. Multicast optimal resource allocation by claiming that the received messages rate is 0
/// </remarks>
public void UseUnicast()
{
this.flags |= G_USEUNICAST;
this.CheckState();
}
/// <summary>
/// Specifies that the user wishes this group to use an IP multicast address if possible
/// </summary>
/// <remarks>
/// Forces this group to the top of the sort order used in the Dr. Multicast resource allocation scheme by claiming the
/// received message rate is infinitely high
/// </remarks>
public void UseIPMC()
{
this.flags |= G_USEIPMC;
this.CheckState();
}
/// <summary>
/// Returns a list of the failed members of the current group, if any
/// </summary>
/// <returns>List of members that have failed (should quickly be reported as leaving in a new view)</returns>
public Address[] GetFailedMembers()
{
this.ConfirmJoined();
return this.theView.GetFailedMembers();
}
/// <summary>
/// Returns a list of the live members of the current group, normally all members of the current view
/// </summary>
/// <returns>List of members thatare live. Normally this will be the full membership since new views are reported quickly. </returns>
public Address[] GetLiveMembers()
{
this.ConfirmJoined();
return this.theView.GetLiveMembers();
}
private void ConfirmJoined()
{
if (this.theView == null)
{
throw new VsyncException("Must Join before accessing membership data");
}
}
/// <summary>
/// Returns a list of the full membership of this view, including members that have just been noted as faulty but not yet reported via a new view
/// </summary>
/// <returns>Membership list</returns>
public Address[] GetMembers()
{
this.ConfirmJoined();
return this.theView.members;
}
/// <summary>
/// Returns the caller's rank in the group view
/// </summary>
/// <returns>Rank from 0..(N-1) as an offset into the full membership list of the group</returns>
public int GetMyRank()
{
this.ConfirmJoined();
return this.theView.GetMyRank();
}
/// <summary>
/// Returns the rank of a designated process in the group view
/// </summary>
/// <param name="who"></param>
/// <returns>Rank from 0..(N-1) as an offset into the full membership list of the group, or -1 if the process is not a group member.</returns>
public int GetRankOf(Address who)
{
this.ConfirmJoined();
return this.theView.GetRankOf(who);
}
/// <summary>
/// Gets the length of the membership list
/// </summary>
/// <returns>N, where the membership consists of N members ranked 0..(N-1)</returns>
public int GetSize()
{
this.ConfirmJoined();
return this.theView.GetSize();
}
internal class SUTW : IComparable, IComparable<SUTW>, IEquatable<SUTW>
{
// Unique ID is Sender::Uid; it "names" this SafeSend. Current ordering is Who::TS, order is final when committed flag is true
internal Address Sender;
internal Address Who;
internal int Uid;
internal int TS;
internal bool commitFlag = false;
internal SUTW(Address s, int u, int t, Address w)
{
this.Sender = s;
this.Uid = u;
this.TS = t;
this.Who = w;
}
public static bool operator <(SUTW first, SUTW second)
{
return Compare(first, second) < 0;
}
public static bool operator >(SUTW first, SUTW second)
{
return Compare(first, second) > 0;
}
public static bool operator <=(SUTW first, SUTW second)
{
return Compare(first, second) <= 0;
}
public static bool operator >=(SUTW first, SUTW second)
{
return Compare(first, second) >= 0;
}
public static bool operator ==(SUTW first, SUTW second)
{
return Compare(first, second) == 0;
}
public static bool operator !=(SUTW first, SUTW second)
{
return Compare(first, second) != 0;
}
public static int Compare(SUTW first, SUTW second)
{
if (object.ReferenceEquals(first, second))
{
return 0;
}
if (object.ReferenceEquals(first, null))
{
return -1;
}
if (object.ReferenceEquals(second, null))
{
return 1;
}
int comparison = first.TS.CompareTo(second.TS);
if (comparison != 0)
{
return comparison;
}
comparison = first.Who.CompareTo(second.Who);
return comparison;
}
public int CompareTo(object other)
{
return Compare(this, other as SUTW);
}
public int CompareTo(SUTW other)
{
return Compare(this, other);
}
public override bool Equals(object other)
{
return Compare(this, other as SUTW) == 0;
}
public bool Equals(SUTW other)
{
return Compare(this, other) == 0;
}
public override int GetHashCode()
{
return this.Who.GetHashCode() ^ this.TS * 11;
}
}
private SortedList<SUTW, Msg> SSList = new SortedList<SUTW, Msg>();
internal LockObject SSLock = new LockObject("SSLock");
internal int logicalTS;
internal LockObject CommitLock = new LockObject("CommitLock");
internal LockObject ProposeLock = new LockObject("ProposeLock");
internal class ctuple
{
internal int senderRank;
internal int[] theVT;
internal Msg theMsg;
internal long whenEnqueued;
internal ctuple(int sr, int[] vt, Msg m)
{
this.senderRank = sr;
this.theVT = vt;
this.theMsg = m;
this.whenEnqueued = Vsync.NOW;
}
}
internal LockObject CausalOrderListLock = new LockObject("CausalOrderListLock");
internal List<ctuple> CausalOrderList = new List<ctuple>();
internal volatile int CausalOrderListCount;
internal LockObject OrderedSubsetListLock = new LockObject("OrderedSubsetListLock");
internal SortedList<osspq, Msg> OrderedSubsetPQ = new SortedList<osspq, Msg>();
internal volatile int OrderedSubsetPQCount = 0;
internal LockObject RecentOpqNodesLock = new LockObject("RecentOpqNodesLock");
internal List<osspq> RecentOpqNodes = new List<osspq>();
internal long myTS;
/// <exclude></exclude>
#if PROTOCOL_BUFFERS
[ProtoContract(SkipConstructor = true)]
#else
[AutoMarshalled]
#endif
public class osspq : IComparable, IComparable<osspq>, IEquatable<osspq>
{
/// <exclude></exclude>
[ProtoMember(1)]
public bool cflag = false;
/// <exclude></exclude>
[ProtoMember(2)]
public long proposedTS;
/// <exclude></exclude>
[ProtoMember(3)]
public Address proposedWho;
/// <exclude></exclude>
[ProtoMember(4)]
public List<Address> dests;
/// <exclude></exclude>
[ProtoMember(5)]
public readonly Address sender;
/// <exclude></exclude>
[ProtoMember(6)]
public readonly int vid;
/// <exclude></exclude>
[ProtoMember(7)]
public readonly int msgid;
internal long gtime;
#if PROTOCOL_BUFFERS
[ProtoAfterDeserialization]
private void AfterDeserialize()
{
if (this.dests == null)
{
this.dests = new List<Address>(0);
}
}
#else
/// <exclude></exclude>
public osspq()
{
}
#endif
internal osspq(long ts, Address who, List<Address> dests, Address sender, int vid, int msgid)
{
this.proposedTS = ts;
this.proposedWho = who;
this.dests = dests;
this.sender = sender;
this.vid = vid;
this.msgid = msgid;
}
/// <exclude></exclude>
public override string ToString()
{
return "OPQ[" + this.proposedTS + "::" + this.proposedWho + (this.cflag ? "** committed ** " : string.Empty) + ">; for msg " + this.sender + "::" + this.vid + ":" + this.msgid + "]";
}
internal void commit(long ts, Address who)
{
this.proposedTS = ts;
this.proposedWho = who;
this.cflag = true;
}
/// <exclude></exclude>
public static bool operator <(osspq first, osspq second)
{
return Compare(first, second) < 0;
}
/// <exclude></exclude>
public static bool operator >(osspq first, osspq second)
{
return Compare(first, second) > 0;
}
/// <exclude></exclude>
public static bool operator <=(osspq first, osspq second)
{
return Compare(first, second) <= 0;
}
/// <exclude></exclude>
public static bool operator >=(osspq first, osspq second)
{
return Compare(first, second) >= 0;
}
/// <exclude></exclude>
public static bool operator ==(osspq first, osspq second)
{
return Equals(first, second);
}
/// <exclude></exclude>
public static bool operator !=(osspq first, osspq second)
{
return !Equals(first, second);
}
/// <exclude></exclude>
public static int Compare(osspq first, osspq second)
{
if (object.ReferenceEquals(first, second))
{
return 0;
}
if (object.ReferenceEquals(first, null))
{
return -1;
}
if (object.ReferenceEquals(second, null))
{
return 1;
}
int comparison = first.proposedTS.CompareTo(second.proposedTS);
if (comparison != 0)
{
return comparison;
}
comparison = first.proposedWho.CompareTo(second.proposedWho);
return comparison;
}
/// <exclude></exclude>
public static bool Equals(osspq first, osspq second)
{
if (object.ReferenceEquals(first, second))
{
return true;
}
if (object.ReferenceEquals(first, null) || object.ReferenceEquals(second, null))
{
return false;
}
return first.sender == second.sender && first.vid == second.vid && first.msgid == second.msgid;
}
/// <exclude></exclude>
public int CompareTo(object other)
{
return Compare(this, other as osspq);
}
/// <exclude></exclude>
public int CompareTo(osspq other)
{
return Compare(this, other);
}
/// <exclude></exclude>
public override bool Equals(object other)
{
return Equals(this, other as osspq);
}
/// <exclude></exclude>
public bool Equals(osspq other)
{
return Equals(this, other);
}
/// <exclude></exclude>
public override int GetHashCode()
{
return (this.vid * 771) + (this.msgid * 33033) + this.sender.GetHashCode();
}
}
private void Setup()
{
// None of these procedures are permitted to block, hence we fork new threads for any that might require a long-term
this.doRegister(Vsync.PARTITIONED, new Action<Address>(obj =>
{
if (!VsyncSystem.VsyncRestarting)
{
throw new VsyncException("Partitioning event, this node was in the minority partition");
}
}));
this.doRegister(Vsync.PROPOSE, new Action<Vsync.ViewDelta[], Vsync.UnstableList[]>((vds, usl) =>
{
if (!this.HasFirstView || this.theView.GetMyRank() == -1)
{
throw new VsyncException("Got Vsync.PROPOSED in ORACLE before receiving the first ORACLE view");
}
using (var tmpLockObj = new LockAndElevate(this.ProposeLock))
{
bool mustWait = false;
// We ddon't wedge the ORACLE since its only role is membership consensus and the scheme is our
// home-brew version of the leader-based paxos. If the ORACLE had any other kind of multicasts
// that would be a different story, but it doesn't and we're not going to add any
if (this != Vsync.ORACLE)
{
using (var tmpLockObj1 = new LockAndElevate(this.GroupFlagsLock))
{
if ((this.flags & G_WEDGED) == 0)
{
mustWait = true;
this.flags |= G_WEDGED;
}
}
}
if (mustWait)
{
ILock.NoteThreadState("Wedged(ORACLE).WaitOne()");
this.Wedged.WaitOne();
ILock.NoteThreadState(null); ;
}
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
string us = " ";
foreach (Vsync.UnstableList ul in usl)
{
us += ul.gaddr + "::" + ul.sender + ":" + ul.vid + "[" + ul.mid_low + "-" + ul.mid_hi + "] ";
}
Vsync.WriteLine("=== W E D G E D < " + this.gname + "> vid " + this.theView.viewid + ", initial UNSTABLE LIST = { " + us + "} ===");
}
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT | VsyncSystem.GROUPEVENTS)) != 0)
{
foreach (Vsync.ViewDelta vd in vds)
{
Vsync.WriteLine(" " + vd);
}
}
// Creating a new thread this way may seem risky: it eliminates the risk of a deadlock associated with doing a reply
// but now we have to worry about event ordering relative to other incoming messages. This is safe because the
// group is wedged and flushing. Under these conditions, nothing much can happen and we can use our new thread
// and thus avoid running into problems like deadlock if the limit on pending asynchronous p2p messages is reached.
Thread t = new Thread(() =>
{
try
{
// This do loop is here purely to avoid a "goto"
do
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT | VsyncSystem.GROUPEVENTS)) != 0)
{
foreach (Vsync.ViewDelta vd in vds)
{
Vsync.WriteLine(" " + vd);
}
}
// This next line is safe only because there is a single dispatch thread doing both the PROPOSE and INITIALVIEW callbacks!
if (!this.HasFirstView)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("PROPOSE handler Waiting on LLINITV barrier lock for gaddr " + this.gaddr + "(Lock [" + ILock.LLINITV + "][" + ILock.GetLockId(ILock.LLINITV, this.gaddr.GetHashCode()) + "])");
}
ILock.Barrier(ILock.LLINITV, this.gaddr).BarrierWait();
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("PROPOSE handler After LLINITV barrier lock for gaddr " + this.gaddr + "(Lock [" + ILock.LLINITV + "][" + ILock.GetLockId(ILock.LLINITV, this.gaddr.GetHashCode()) + "])");
}
}
Vsync.ViewDelta[] rvds = new Vsync.ViewDelta[vds.Length];
bool involvesOracle = false;
bool involvesMe = false;
Address[] leaving = new Address[0];
foreach (Vsync.ViewDelta vd in vds)
{
if (vd.gaddr == Vsync.ORACLE.gaddr)
{
involvesOracle = true;
}
if (vd.gaddr == this.gaddr)
{
involvesMe = true;
}
int len = vd.leavers.Length;
if (len > 0)
{
int oldlen = leaving.Length;
Vsync.ArrayResize(ref leaving, oldlen + len);
Array.Copy(vd.leavers, 0, leaving, oldlen, len);
}
}
Vsync.Proposed = vds;
if (!involvesMe || (this == Vsync.ORACLE && !involvesOracle))
{
// Special case: Oracle was "cc'ed" and should ack, but has no actual role
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("PROPOSE: doReply special case (ORACLE acks)");
}
this.doReply(Vsync.my_address, new Vsync.ViewDelta[0]);
break;
}
// Give the application a chance to flush, if using the new FlushHandler upcalls
using (var tmpLockObj1 = new LockAndElevate(this.FlushHandlers.vhListLock))
{
if (this.FlushHandlers.vhList.Count > 0)
{
View v;
using (var tmpLockObj2 = new LockAndElevate(this.ViewLock))
{
v = new View(this.theView.gname, this.theView.gaddr, this.theView.members, this.theView.viewid, this.theView.isLarge);
}
// Tinker with the View to report who will join or leave without actually modifying the list of members
foreach (Vsync.ViewDelta vd in vds)
{
if (vd.gaddr == this.gaddr && vd.prevVid == v.viewid)
{
Vsync.UpdateGroupView(false, vd, this, "PROPOSE:FLushUpcall", ref v, true);
foreach (VHCallBack vhcb in this.FlushHandlers.vhList)
{
vhcb.vhProc(v);
}
}
}
}
}
// Flush those I've been asked to forward. Code takes advantage of the group being in a WEDGED state: no new traffic allowed in...
using (Semaphore CPSSema = new Semaphore(0, int.MaxValue))
{
int cpscnt;
List<Msg> mustSend;
this.startFlush(int.MaxValue, usl, out mustSend, out cpscnt, CPSSema);
this.endFlush(leaving, mustSend, cpscnt, CPSSema);
}
// Now look up the groups, again, and this time note the final Msgids
// for those vds corresponding to "my" group
for (int idx = 0; idx < vds.Length; idx++)
{
Vsync.ViewDelta vd = vds[idx];
rvds[idx] = new Vsync.ViewDelta(vd.gname, vd.gaddr, 0L, Group.MCMDMAP(vd.gaddr), -1, new int[0], false);
if (vd.gaddr != this.gaddr)
{
continue;
}
if ((this.flags & G_ISLARGE) != 0)
{
throw new VsyncException("PROPOSE: vd applied to a LARGE group!");
}
if (this != Vsync.ORACLE || !this.IAmLeader())
{
this.theView.isFinal = true;
}
rvds[idx].prevVid = this.theView.viewid;
rvds[idx].lastSeqns = new int[this.theView.NextIncomingMsgID.Length - 1];
////string ls = " ";
for (int i = 0; i < this.theView.NextIncomingMsgID.Length - 1; i++)
{
////ls += this.theView.NextIncomingMsgID[i + 1] + " ";
rvds[idx].lastSeqns[i] = this.theView.NextIncomingMsgID[i + 1];
}
foreach (Address d in vd.leavers)
{
int r = this.theView.GetRankOf(d);
if (r != -1)
{
View.noteFailed(this, d);
}
}
}
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("PROPOSE: doReply normal case (processed " + rvds.Length + " view deltas), curMsg=" + this.getReplyTo());
}
this.doReply(Vsync.my_address, this == Vsync.ORACLE ? new Vsync.ViewDelta[0] : rvds);
}
while (false);
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
})
{ Name = "PROPOSED:DOFLUSH", Priority = ThreadPriority.Highest, IsBackground = true };
this.SetReplyThread(t);
t.Start();
}
}));
this.doRegister(Vsync.COMMIT, new Action<Vsync.ViewDelta[], Address[], int[]>((vds, who, uid) =>
{
if (!this.HasFirstView || this.theView.GetMyRank() == -1)
{
throw new VsyncException("Got Vsync.PROPOSED in ORACLE before receiving the first ORACLE view");
}
using (var tmpLockObj = new LockAndElevate(this.groupLock))
using (var tmpLockObj1 = new LockAndElevate(this.CommitLock))
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
string s = "Received a new COMMIT message in group <" + this.gname + "> ViewDelta vector has " + vds.Length + " updates:" + Environment.NewLine;
foreach (Vsync.ViewDelta vd in vds)
{
s += " " + vd + Environment.NewLine;
}
Vsync.Write(s);
}
Vsync.ViewDelta[] nextvds = null;
do
{
if (this == Vsync.ORACLE)
{
// Look at "chunks" of the vds vector that end with an ORACLE join event.
int n;
for (n = 0; n < vds.Length - 1; n++)
{
if (vds[n].gaddr == Vsync.ORACLE.gaddr && vds[n].joiners.Length > 0)
{
break;
}
}
++n; // n is now the length of the chunk
if (n < vds.Length)
{
nextvds = new Vsync.ViewDelta[vds.Length - n];
for (int i = n; i < vds.Length; i++)
{
nextvds[i - n] = vds[i];
}
Vsync.ArrayResize(ref vds, n);
}
else
{
nextvds = null;
}
// Now there is a chunk in vds, and perhaps more work to do later in nextvds
// Ignore the current chunk if I've already seen it, or it predates my joining the ORACLE
if ((n = vds.Length - 1) < 0)
{
return;
}
if (vds[n].gaddr == Vsync.ORACLE.gaddr && vds[n].joiners.Length > 0)
{
using (var tmpLockObj2 = new LockAndElevate(this.ViewLock))
{
if (this.theView != null)
{
if (vds[n].prevVid < this.theView.viewid)
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
string s = "IGNORING STALE COMMIT actions in group <" + this.gname + "> (theView.viewid=" + this.theView.viewid + "), Ignored part of ViewDelta vector has " + vds.Length + " updates:";
foreach (Vsync.ViewDelta vd in vds)
{
s += " " + vd + Environment.NewLine;
}
Vsync.Write(s);
}
continue;
}
}
}
}
}
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
string s = "APPLYING COMMIT actions in group <" + this.gname + "> (theView.viewid=" + this.theView.viewid + "), this chunk of ViewDelta vector has " + vds.Length + " updates:";
foreach (Vsync.ViewDelta vd in vds)
{
s += " " + vd + Environment.NewLine;
}
Vsync.Write(s);
}
if (this != Vsync.ORACLE)
{
int myVid = 0;
using (var tmpLockObj2 = new LockAndElevate(this.ViewLock))
{
if (this.theView != null)
{
myVid = this.theView.viewid;
}
}
vds = vds.Where(vd => vd.gaddr == this.gaddr && vd.prevVid >= myVid).ToArray();
if (vds.Length == 0)
{
return;
}
}
this.gotNewViewDeltas(vds);
Vsync.CommitGVUpdates(this, vds);
if (this == Vsync.ORACLE)
{
for (int i = 0; i < who.Length; i++)
{
Vsync.PurgeGVE(who[i], uid[i]);
}
}
Vsync.Proposed = null;
if (this == Vsync.ORACLE)
{
using (var tmpLockObj2 = new LockAndElevate(Vsync.ORACLE.groupLock))
{
foreach (Vsync.ViewDelta vd in vds)
{
if (vd.gaddr == Vsync.ORACLE.gaddr)
{
Vsync.OracleJoinsUnderway -= vd.joiners.Length;
}
}
if (Vsync.OracleJoinsUnderway < 0)
{
Vsync.OracleJoinsUnderway = 0;
}
}
if (Vsync.ORACLE.IAmLeader())
{
ILock.Barrier(ILock.LLWAIT, ILock.LCOMMIT).BarrierRelease(1);
}
Vsync.SendInitialView(vds);
}
if (this != Vsync.ORACLE)
{
using (var tmpLockObj2 = new LockAndElevate(this.GroupFlagsLock))
{
if ((this.flags & G_WEDGED) != 0)
{
this.Wedged.Release();
this.flags &= ~G_WEDGED;
}
}
}
}
while ((vds = nextvds) != null);
}
List<Address> ripdup = new List<Address>();
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
foreach (Address rip in Vsync.RIPList)
{
ripdup.Add(rip);
}
}
foreach (Address rip in ripdup)
{
int rank = this.theView.GetRankOf(rip);
if (rank != -1)
{
View.noteFailed(this, rip);
}
}
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("Done with the COMMIT message in group " + this.gname);
}
if (this.theView.GetMyRank() == -1)
{
this.GroupClose();
}
}));
this.doRegister(Vsync.COMMIT, new Action<Vsync.ViewDelta[]>(vds =>
{
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
vds = vds.Select(vd => vd).Where(vd => vd.gaddr == this.gaddr && vd.prevVid >= (theView == null ? -1 : theView.viewid)).ToArray();
if (vds.Length == 0)
{
return;
}
int initialSeqn = this.curMsgId();
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.TOKENLOGIC)) != 0)
{
Vsync.WriteLine("Received a new LGCOMMIT message in group <" + this.gname + "> ViewDelta vector has " + vds.Length + " updates:");
foreach (Vsync.ViewDelta vd in vds)
{
Vsync.WriteLine(" " + vd);
}
}
if (vds.Length > 0)
{
this.gotNewViewDeltas(vds);
}
if (this.theToken != null)
{
this.theToken.logicalClock++;
this.theToken.resetStableByLevel(this);
}
using (var tmpLockObj = new LockAndElevate(this.CommitLock))
{
Vsync.CommitGVUpdates(this, vds, ref theView);
}
foreach (Vsync.ViewDelta vd in vds)
{
if (vd.joiners.Length > 0)
{
if (vd.prevVid == -1)
{
Vsync.WriteLine("Calling GetMap(" + this.gname + ", true) in Vsync.COMMIT");
MCMDSocket.GetMap(this.gaddr, true);
}
if (this.IAmRank0())
{
Address[] AllJoiners = Vsync.Expand(vd.joiners);
foreach (Address who in AllJoiners)
{
byte[] ba = Msg.toBArray(Vsync.INITIALVIEW, Vsync.my_address, theView, MCMDSocket.GetMap(this.gaddr, false), initialSeqn);
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.TOKENLOGIC)) != 0)
{
Vsync.WriteLine("LargeGroup owner for <" + this.gname + "> sending INITIALVIEW with initialSeqn=" + initialSeqn + " to " + who + "(view = " + theView + ")");
}
ReliableSender.SendP2P(Msg.ISGRPP2P, who, this, theView.viewid, ReliableSender.P2PSequencer.NextP2PSeqn("initialview/lg", who), ba, true, null, null);
}
}
}
}
Vsync.Proposed = null;
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.TOKENLOGIC)) != 0)
{
Vsync.WriteLine("Done with the LGCOMMIT message in group " + this.gname);
}
}));
this.doRegister(Vsync.INQUIRE, new Action<Address>(newLeader =>
{
// This runs only in the ORACLE and uses a separate thread just because it sends a reply and we were worried about
// some obscure deadlock risks involving waiting because of too many asynchronous p2p messages in the reliable sender
// subsystem. In fact the ORACLE only runs the leader-based 2pc/3pc protocol (this is the 1st phase message in the 3pc case
// when a new leader takes over) and the group is otherwise "idle", so whether we use a new thread or not makes
// no difference to event ordering
Thread t = new Thread(() =>
{
try
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("Received a new INQUIRE request in group " + this.gname + " from " + newLeader);
}
int hisRank = this.theView.GetRankOf(newLeader);
if (hisRank == -1)
{
throw new VsyncException("INQUIRE from a new leader who isn't in the group (or isn't alive)!" + VsyncSystem.GetState());
}
for (int i = 0; i < hisRank - 1; i++)
{
if (!this.theView.hasFailed[i])
{
Vsync.NodeHasFailed(this.theView.members[i], "(from INQUIRE)", true);
}
}
Vsync.ViewDelta[] proposed = Vsync.Proposed ?? new Vsync.ViewDelta[0];
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("Reply to INQUIRE request in group " + this.gname + " with " + proposed.Length + " events");
}
this.doReply(Vsync.my_address, Vsync.ORACLE.theView.members[0], proposed);
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "Receive INQUIRE about existing known view updates", IsBackground = true };
this.SetReplyThread(t);
t.Start();
}));
this.doRegister(Vsync.INITIALVIEW, new Action<Address, string[], Address[], long[], View[], bool[], int, int[,], Address[]>((sender, names, gaddrs, tsigs, vs, isl, curPhysIPA, mms, CBMs) =>
{
VsyncSystem.VsyncRestarting = false;
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.STARTSEQ | VsyncSystem.MCMDMAP)) != 0)
{
Vsync.WriteLine("Received a new ORACLE-member INITIALVIEW event in group " + this.gname + ", Vsync.ClientOf is " + (Vsync.ClientOf == null ? "null" : "non-null (" + Vsync.ClientOf + ")")
+ ", CanBeOracleList=" + Address.VectorToString(CBMs));
for (int i = 0; i < names.Length; i++)
{
Vsync.WriteLine(" .... <" + names[i] + ">: " + (isl[i] ? "(long) " : string.Empty) + gaddrs[i] + ", View=" + vs[i]);
}
Vsync.WriteLine("New ORACLE-member initial MCMD Mapping information:");
for (int i = 0; i < mms.GetLength(0); i++)
{
Vsync.WriteLine(" Map virtual address " + MCMDSocket.PMCAddr(mms[i, MCMDSocket.VIRTUAL]) + " to " + MCMDSocket.PMCAddr(mms[i, MCMDSocket.PHYSICAL]));
}
}
if (Vsync.ClientOf != null)
{
throw new VsyncException("ORACLE initial-view event unexpected when in client-of mode");
}
using (var tmpLockObj = new LockAndElevate(Vsync.CanBeOracleListLock))
{
foreach (Address a in CBMs)
{
Vsync.CanBeOracleList.Add(a);
}
}
MCMDSocket.nextPhysIPAddr = curPhysIPA;
for (int i = 0; i < names.Length; i++)
{
if (names[i].Equals("ORACLE", StringComparison.Ordinal))
{
Group g = Vsync.ORACLE;
Vsync.LeaderId = vs[i].leaderId;
if (!g.HasFirstView || g.theView.viewid < vs[i].viewid)
{
g.NewView(vs[i], "ORACLE", new[] { mms[i, MCMDSocket.VIRTUAL], mms[i, MCMDSocket.PHYSICAL] });
g.ReplayToDo();
}
if (g.theView.GetRankOf(Vsync.my_address) == -1)
{
throw new VsyncException("ORACLE initial-view GROUP LIST: oracle didn't list me" + Environment.NewLine + VsyncSystem.GetState());
}
}
else
{
Group tpg = Group.TrackingProxyLookup(gaddrs[i]) ?? Group.TrackingProxy(names[i], "INITIALVIEW[]", gaddrs[i], tsigs[i], null, vs[i], isl[i] ? G_ISLARGE : 0, false);
if (tpg != null && (!tpg.HasFirstView || tpg.theView.members.Length == 0 || tpg.theView.viewid <= vs[i].viewid))
{
tpg.NewView(vs[i], "view[] initializer", new[] { mms[i, MCMDSocket.VIRTUAL], mms[i, MCMDSocket.PHYSICAL] });
}
if (!tpg.hasPhysMapping && mms[i, MCMDSocket.PHYSICAL] != MCMDSocket.UNKNOWN)
{
tpg.hasPhysMapping = true;
MCMDSocket.SetMap("ORACLE initializer", tpg.gname, true, new[] { mms[i, MCMDSocket.VIRTUAL], mms[i, MCMDSocket.PHYSICAL] });
}
}
}
MCMDSocket.AssignMapInfo(mms);
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.STARTSEQ | VsyncSystem.MCMDMAP)) != 0)
{
Vsync.WriteLine("Finished in ORACLE-member Vsync.INITIALVIEW: my state is " + VsyncSystem.GetState());
}
}));
this.doRegister(Vsync.INITIALVIEW, new Action<Address, View, int[], int>(this.AcceptInitialView));
this.doRegister(Vsync.INITIALVIEW, new Action<Address, string, int[], int>((sender, OOBFname, mm, initialseqn) =>
{
if ((VsyncSystem.Debug & VsyncSystem.OOBXFERS) != 0)
{
Vsync.WriteLine("INITIALVIEW via OOB: calling OOBFetch for <" + OOBFname + ">");
}
new Thread(() =>
{
MemoryMappedFile mmf = this.OOBFetch(OOBFname);
if (mmf != null)
{
MemoryMappedViewAccessor mmva = mmf.CreateViewAccessor();
byte[] ba = new byte[mmva.Capacity];
mmva.ReadArray(0, ba, 0, (int)mmva.Capacity);
this.AcceptInitialView(sender, new View(ba), mm, initialseqn);
if ((VsyncSystem.Debug & VsyncSystem.OOBXFERS) != 0)
{
Vsync.WriteLine("INITIALVIEW via OOB: after accepting initial view for <" + OOBFname + ">");
}
}
else
{
throw new VsyncException("OOBFetch failed for " + OOBFname);
}
}) { Name = "Fetch INITIALVIEW", IsBackground = true }.Start();
}));
this.doRegister(Vsync.JOINFAILED, new Action<Address, string>((gaddr, reason) =>
{
Group g = Group.doLookup(gaddr);
g.joinFailed = true;
g.reason = reason;
ILock.Barrier(ILock.LLINITV, gaddr).BarrierReleaseAll();
}));
this.doRegister(Vsync.INITIALVIEW, new Action<Address>(sender =>
{
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.STARTSEQ)) != 0)
{
Vsync.WriteLine("Received a new client-of INITIALVIEW event in group " + this.gname + ", and Vsync.ClientOf is " + (Vsync.ClientOf == null ? "null" : "non-null (" + Vsync.ClientOf + ")") + " and <ORACLE>=" + Vsync.ORACLE);
}
if (sender.isMyAddress())
{
throw new VsyncException("Client-of myself!" + Environment.NewLine + VsyncSystem.GetState());
}
if (Vsync.ORACLE.theView != null && Vsync.ORACLE.theView.GetMyRank() != -1)
{
Vsync.WriteLine("ORACLE owner told me to be a client, then changed his mind (WARNING only)");
return;
}
VsyncSystem.VsyncRestarting = false;
Group g;
if ((g = Group.doLookup("ORACLE")) != null && g.HasFirstView)
{
g.Leave();
}
else if (g != null)
{
g.GroupClose();
}
Vsync.ClientOf = sender;
Vsync.OracleFailedAt = 0;
Vsync.ORACLE = TrackingProxy("ORACLE", "Client-Of initializer", Vsync.ORACLE.gaddr, 0, null, new View("ORACLE", Vsync.ORACLE.gaddr, new[] { sender }, 0, (this.flags & G_ISLARGE) != 0), this.flags, false);
ILock.Barrier(ILock.LLWAIT, ILock.LCLIENTOF).BarrierRelease(1);
if ((VsyncSystem.Debug & VsyncSystem.STARTSEQ) != 0)
{
Vsync.WriteLine("Finished in Vsync.INITIALVIEW (client-of case): my state is " + VsyncSystem.GetState());
}
}));
this.doRegister(Vsync.STATEXFER, new Action(() =>
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("Received end-of-statexfer (EndXFER) in group " + this.gname);
}
this.EndStateXfer();
}));
this.doRegister(Vsync.CRYPTOWRAPPED, new Action<byte[]>(ciphered =>
{
// A crypto-wrapped message is a p2p message with an enciphered body. Decrypt it, then redeliver using the "true" body
if (this.myAes == null)
{
throw new VsyncException("Received an enciphered object but this group member didn't know the key!");
}
byte[] buffer = this.decipherBuf(ciphered);
if (buffer != null && buffer.Length > 0)
{
this.Redeliver(buffer);
}
else
{
using (var tmpLockObj = new LockAndElevate(this.GroupFlagsLock))
{
if ((this.flags & G_NEEDSTATEXFER) != 0)
{
this.EndStateXfer();
}
}
}
}));
this.doRegister(Vsync.CLIENTWRAPPED, new Action<string, int, byte[]>((gname, rcode, clientreq) =>
{
// A client-wrapped message is one sent from the Vsync "Client of a group" API; it arrives in Vsync.VSYNCMEMBERS but needs to be redelivered in the
// actual target group, which involves moving the true "reply-to" message from VSYNCMEMBERS to the target group, and also getting the redeliver API
// to check for redelivery information on the redelivery queue of VSYNCMEMBERS rather than the one associated with the target group
Group g = Group.Lookup(gname);
if (g != null)
{
Msg rmsg = this.getReplyToAndClear();
if (rmsg != null)
{
rmsg.gaddr = this.gaddr;
g.setReplyTo(rmsg);
}
if (g.allowsClientRequests[rcode + Vsync.SYSTEMREQS])
{
g.Redeliver(clientreq, this.rdiLock, this.rdiList);
}
else
{
g.AbortReply("Group has not enabled client requests to request code " + rcode + " (did you forget to call g.AllowClientRequests?)");
}
}
else
{
this.AbortReply("Group <" + gname + "> not found");
}
}));
this.doRegister(Vsync.FRAGMENT, new Action<Address, int, long, int, int, byte, byte[]>((sender, fid, tl, nf, fn, fflags, frag) => deFragGotFrag(this, sender, fid, tl, nf, fn, fflags, frag)));
this.doRegister(Vsync.REMAP, new Action<int, MCMDSocket.GRPair[]>((id, grps) =>
{
if ((VsyncSystem.Debug & VsyncSystem.MCMDMAP) != 0)
{
Vsync.WriteLine("Recomputing MCMD Mapping!");
}
MCMDSocket.ComputeMCMDMapping(id, grps);
}));
this.doRegister(Vsync.REMAP, new Action<int, int, int[,]>((epochId, nip, ma) =>
{
MCMDSocket.nextPhysIPAddr = nip;
Vsync.MapperEpochId = Math.Max(Vsync.MapperEpochId, epochId + 1);
MCMDSocket.ReMap(epochId, ma);
}));
this.doRegister(Vsync.REMAP, new Action<int[,]>(ma =>
{
using (var tmpLockObj = new LockAndElevate(Group.VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
Group g = kvp.Value;
int virtIPAddr = g.myVirtIPAddr;
if (virtIPAddr == MCMDSocket.UNKNOWN)
{
virtIPAddr = g.myVirtIPAddr = Vsync.CLASSD + Vsync.VSYNC_MCRANGE_LOW + Address.GroupNameHash(g.gname);
}
if (g.myPhysIPAddr == MCMDSocket.UNKNOWN)
{
for (int i = 0; i < ma.GetLength(0); i++)
{
if (ma[i, MCMDSocket.VIRTUAL] == virtIPAddr)
{
g.myVirtIPAddr = virtIPAddr;
g.myPhysIPAddr = ma[i, MCMDSocket.PHYSICAL];
}
}
}
}
}
}));
this.doRegister(Vsync.TERMINATE, new Action(() =>
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("Group <" + this.gname + "> received TERMINATE event" + VsyncSystem.GetState());
}
if ((this.flags & G_ISLARGE) != 0 && !this.TermSent && this.IAmLeader())
{
this.TermSent = true;
this.doSendNotFromOracle(Vsync.TERMINATE);
}
this.GroupClose();
}));
this.doRegister(Vsync.TERMINATE, new Action<Address>(obj =>
{
this.PrepareForTermination();
this.doNullReply();
}));
this.doRegister(Vsync.FANNOUNCE, new Action<Address>(which =>
{
ReliableSender.NodeHasFailed(which);
GroupNoteFailure(this, which);
}));
this.doRegister(Vsync.OUTOFBAND, new Action<string, Address, long, List<Address>>(this.doOOBUpdateRegistry));
this.doRegister(Vsync.OUTOFBAND, new Action<List<OOBRepInfo>>(this.doOOBUpdateRegistry));
this.doRegister(Vsync.OUTOFBAND, new Action<string, long, long, byte[], bool>(doOOBGotChunk));
this.doRegister(Vsync.OUTOFBAND, new Action(doOOBSetupTCPListener));
this.doRegister(Vsync.OUTOFBAND, new Action<string, string, Address, int, long, List<IPAddress>, Address, Address[], int[], int>(this.doOOBPrepare));
this.doRegister(Vsync.OUTOFBAND, new Action<string, int>(this.doReportChunkStatus));
this.doRegister(Vsync.OUTOFBAND, new Action<string>(this.doOOBDone));
this.doRegister(Vsync.OUTOFBAND, new Action<string, bool>(this.doOOBDone));
this.doRegister(Vsync.OUTOFBAND, new Action<IPAddress,string>((ipa, fname) => { this.reportWhenXferDone(ipa, fname); } ));
this.doRegister(Vsync.CAUSALSEND, new Action<int, int[], Msg>((vid, theVT, theMsg) =>
{
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (vid != theView.viewid)
{
// If CausalSend shows up in a different view than the one in which it was sent, the prior messages will
// have been delivered by virtue of the group flush protocol having run to completion during the view change
this.incomingSends.put(new Msg(theMsg));
return;
}
Msg cm = this.curMsg();
if (cm == null)
{
throw new VsyncException("CausalSend: curMsg null");
}
int senderRank = theView.GetRankOf(cm.sender);
if ((VsyncSystem.Debug & VsyncSystem.CAUSALDELIVERY) != 0)
{
Vsync.WriteLine("Got new CausalSend with viewid=" + vid + ", VT=" + VTtoString(theVT) + " from sender with rank " + senderRank + ", my VT=" + VTtoString(theView.myVT));
}
List<ctuple> newCList = new List<ctuple>();
List<ctuple> toDeliver = new List<ctuple>();
ctuple myCT = new ctuple(senderRank, theVT, theMsg);
using (var tmpLockObj = new LockAndElevate(this.CausalOrderListLock))
{
bool inserted = false;
foreach (ctuple ct in this.CausalOrderList)
{
if (myCT != null && this.happensBefore(theVT, ct.theVT))
{
inserted = true;
newCList.Add(myCT);
myCT = null;
}
newCList.Add(ct);
}
if (!inserted)
{
newCList.Add(myCT);
}
this.CausalOrderList = newCList;
this.CausalOrderListCount = this.CausalOrderList.Count;
bool fnd;
do
{
fnd = false;
List<ctuple> oldClist = this.CausalOrderList;
this.CausalOrderList = new List<ctuple>();
this.CausalOrderListCount = 0;
foreach (ctuple ct in oldClist)
{
if (this.isDeliverable(ct.senderRank, ct.theVT))
{
fnd = true;
if ((VsyncSystem.Debug & VsyncSystem.CAUSALDELIVERY) != 0)
{
Vsync.WriteLine("Causal delivery with viewid=" + theView.viewid + ", VT=" + VTtoString(ct.theVT) + " of " + ct.theMsg + ", my VT=" + VTtoString(theView.myVT));
}
toDeliver.Add(ct);
}
else
{
this.CausalOrderList.Add(ct);
this.CausalOrderListCount++;
}
}
}
while (fnd);
}
bool dme = false;
foreach (ctuple ct in toDeliver)
{
if (ct == myCT)
{
dme = true;
}
this.incomingSends.put(new Msg(ct.theMsg));
}
if (!dme && (VsyncSystem.Debug & VsyncSystem.CAUSALDELIVERY) != 0)
{
Vsync.WriteLine("Needed to delay CausalSend with viewid=" + vid + ", VT=" + VTtoString(theVT) + " from sender with rank " + senderRank + ", my VT=" + VTtoString(theView.myVT));
}
}));
this.doRegister(Vsync.ORDEREDSEND, new Action<Msg>(theMsg =>
{
this.receivedOrderedSends = true;
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("Vsync.ORDEREDSEND[" + this.curMsg().sender + "::" + this.curMsg().vid + ":" + this.curMsg().msgid + "]: Request to establish ordering for " + theMsg.sender + "::" + theMsg.vid + ":" + theMsg.msgid);
}
using (var tmpLockObj = new LockAndElevate(this.OutOfOrderQueueLock))
{
this.OutOfOrderQueue.Add(theMsg);
++this.OutOfOrderQueueCount;
}
if (this.theView.IAmRank0() && !this.onDOQ(theMsg) && (this.flags & G_WEDGED) == 0)
{
Address[] senders;
int[] vids, msgids;
this.GenerateOrdering(out senders, out vids, out msgids, false);
// By using Send instead of doSend, this code deliberately allows the Send to be blocked if the group
// is wedged for a membership change (doSend ignores that kind of wedging). In such cases upon receipt,
// the corresponding message will already have been delivered. But this also forces creation of a new task
// to avoid blocking the group's multicast delivery thread
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("Vsync.ORDEREDSEND: GeneratedOrdering<" + this.gname + ">=" + this.OrderToString(senders, vids, msgids));
}
if (senders.Length > 0)
{
this.NonFlowControlledSend(Vsync.SETORDER, Vsync.my_address, senders, vids, msgids);
}
}
else
{
// This may be what we were waiting for, deliver whatever we now can deliver without a new SETORDER
this.DeliverInOrder("Vsync.ORDEREDSEND(nulls)", new Address[0], new int[0], new int[0]);
}
}));
// These versions are for the Ordered subset multicast.
this.doRegister(Vsync.ORDEREDSEND, new Action<List<Address>, long, Msg>((dests, hisTS, m) =>
{
osspq opq;
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("ORDEREDSEND==> Got dests=" + Address.VectorToString(dests.ToArray()) + ", id=" + m.sender + "::" + m.vid + ":" + m.msgid + ", m=" + m);
}
using (var tmpLockObj = new LockAndElevate(this.OrderedSubsetListLock))
{
if (hisTS > this.myTS)
{
this.myTS = hisTS;
}
opq = new osspq(++this.myTS, Vsync.my_address, dests, m.sender, m.vid, m.msgid);
this.OrderedSubsetPQ.Add(opq, m);
++this.OrderedSubsetPQCount;
}
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("ORDEREDSEND==> doReply(" + opq.proposedTS + "::" + opq.proposedWho + ")");
}
this.doReply(opq.proposedTS, opq.proposedWho);
}));
// Learned the commit time. No need for locks: this runs only on the P2P delivery thread
this.doRegister(Vsync.ORDEREDSEND, new Action<Address, int, int, long, Address>((sentBy, vid, msgid, cts, ctwho) =>
{
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("ORDEREDSEND==>COMMIT (" + sentBy + "::" + vid + ":" + msgid + ") at (" + cts + "::" + ctwho + ")");
}
this.opqDeliver(sentBy, vid, msgid, cts, ctwho);
}));
this.doRegister(Vsync.ORDEREDSEND, new Action<List<osspq>>(recentOpqNds =>
{
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
foreach (osspq opq in recentOpqNds)
{
Vsync.WriteLine("ORDEREDSEND==>COMMIT (" + opq.sender + "::" + opq.vid + ":" + opq.msgid + ") at (" + opq.proposedTS + "::" + opq.proposedWho + ")");
}
Vsync.WriteLine("ORDEREDSEND==>got flush finalizer containing");
}
foreach (osspq opq in recentOpqNds)
{
this.opqDeliver(opq.sender, opq.vid, opq.msgid, opq.proposedTS, opq.proposedWho);
}
}));
this.FlushHandlers += view =>
{
Dictionary<Address, List<osspq>> toSend = new Dictionary<Address, List<osspq>>(100);
using (var tmpLockObj = new LockAndElevate(this.RecentOpqNodesLock))
{
foreach (osspq opq in this.RecentOpqNodes)
{
if (!opq.cflag)
{
throw new VsyncException("Found an uncommitted opq node on the RecentOpqNodes list");
}
List<osspq> theList;
if (!toSend.TryGetValue(opq.sender, out theList))
{
toSend.Add(opq.sender, new List<osspq> { opq });
}
else
{
theList.Add(opq);
}
}
}
foreach (KeyValuePair<Address, List<osspq>> kvp in toSend)
{
this.P2PSend(kvp.Key, Vsync.ORDEREDSEND, kvp.Value);
}
this.Flush();
};
this.doRegister(Vsync.SETORDER, new Action<Address, Address[], int[], int[]>((orderedBy, senders, vids, msgids) =>
{
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Msg theMsg = this.curMsg();
Vsync.WriteLine("Vsync.SETORDER: Received " + theMsg.sender + "::" + theMsg.vid + ":" + theMsg.msgid + ", containing " + this.OrderToString(senders, vids, msgids));
}
this.DeliverInOrder("Vsync.SETORDER", senders, vids, msgids);
}));
this.doRegister(Vsync.SAFESEND, new Action<Address, int, Msg>((sender, Uid, sMsg) =>
{
int myTS;
using (var tmpLockObj = new LockAndElevate(this.SSLock))
{
myTS = ++this.logicalTS;
this.SSList.Add(new SUTW(sender, Uid, myTS, Vsync.my_address), sMsg);
}
if ((VsyncSystem.Debug & VsyncSystem.SAFESEND) != 0)
{
Vsync.WriteLine("Vsync.SAFESEND: Got " + sender + "::" + Uid + " for pending message " + sMsg.sender + "::" + sMsg.vid + ":" + sMsg.msgid + ", my proposal: " + Vsync.my_address + "::" + myTS);
}
if (this.durabilityMethod != null)
{
sMsg.ct = this.durabilityMethod.LogMsg(sMsg);
}
if (this.safeSendThreshold == ALL || this.theView.GetMyRank() < this.safeSendThreshold)
{
this.doReply(myTS, Vsync.my_address);
}
else
{
this.NoReply();
}
}));
this.doRegister(Vsync.SAFEDELIVER, new Action<Address, int, int, Address>((sender, Uid, TS, who) =>
{
if ((VsyncSystem.Debug & VsyncSystem.SAFESEND) != 0)
{
Vsync.WriteLine("Vsync.SAFEDELIVER: Got final TS " + who + "::" + TS + " for pending message " + sender + "::" + Uid);
}
bool fnd = false;
using (var tmpLockObj = new LockAndElevate(this.SSLock))
{
foreach (KeyValuePair<SUTW, Msg> kvp in this.SSList)
{
SUTW suw = kvp.Key;
Msg sMsg = kvp.Value;
if (suw.Sender == sender && suw.Uid == Uid)
{
this.SSList.Remove(suw);
suw.TS = TS;
suw.Who = who;
suw.commitFlag = true;
this.SSList.Add(suw, sMsg);
if (TS > this.logicalTS)
{
this.logicalTS = TS;
}
fnd = true;
break;
}
}
if (fnd)
{
this.deliverSSItems();
}
}
}));
this.doRegister(Vsync.LOCKREQ, new Action<int, string, Address, int, int>((action, lockName, who, cntr, locktype) =>
{
LockInfo lstate;
bool grantIt = false;
bool cancelIt = false;
LockingInUse = true;
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
if (!this.LocksList.TryGetValue(lockName, out lstate))
{
lstate = new LockInfo(lockName);
this.LocksList[lockName] = lstate;
}
Msg cm = this.curMsg();
string minfo = "[" + (cm == null ? "<<null msg>>" : (cm.sender + "::" + cm.vid + ":" + cm.msgid)) + "]";
if (cm.vid != this.lockCurrentViewid)
{
throw new VsyncException(minfo + ": ERROR in LOCKREQ -- message has vid " + cm.vid + ", but LockViewId = " + this.lockCurrentViewid);
}
switch (action)
{
case LOCKIT:
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine(minfo + "; LOCKIT<" + (cntr / 10000) + "." + (cntr % 1000) + ">: " + lockName + ", who=" + who);
}
LockReq lr = new LockReq(who, locktype);
foreach (LockReq wl in lstate.wantLock)
{
if (wl.who == who)
{
// Only one request is permitted at a time
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("Only a single Lock request is permitted at a time. Ignoring request " + who + "::" + ltype[locktype]);
}
break;
}
}
lstate.wantLock.Add(lr);
if (!lstate.islocked || (lstate.locktype == READLOCK && lr.how == READLOCK))
{
grantIt = true;
}
break;
case RELEASEIT:
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine(minfo + "; RELEASEIT<" + (cntr / 10000) + "." + (cntr % 1000) + ">: " + lockName + ", who=" + who);
}
if (!lstate.holders.Contains(who))
{
throw new VsyncException("Locking package exception: RELEASE by " + who + " of " + lockName + (lstate.islocked ? (" held by " + Address.VectorToString(lstate.holders.ToArray())) : " but it wasn't locked"));
}
lstate.holders.Remove(who);
if (lstate.holders.Count == 0)
{
lstate.islocked = false;
grantIt = true;
}
break;
case CANCEL:
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine(minfo + "; CANCEL<" + (cntr / 10000) + "." + (cntr % 1000) + ">: " + lockName + ", who=" + who);
}
foreach (LockReq wl in lstate.wantLock)
{
if (wl.who == who)
{
cancelIt = true;
lstate.wantLock.Remove(wl);
break;
}
}
break;
}
}
if (grantIt && lstate.wantLock.Count > 0)
{
this.GrantLock(lockName, lstate);
}
else if (cancelIt)
{
this.CancelLockRequest(lockName, who, lstate);
}
}));
this.doRegister(Vsync.SGAGGREGATE, new Action<Address, bool, int, int, byte[]>((from, fromBelow, level, vid, ba) =>
{
object[] objs = Msg.BArrayToObjects(ba);
object key = objs[0];
object value = objs[1];
int thevid;
int lcnt = 0;
do
{
if (lcnt++ > 0)
{
// The issue here is that when a new view is delivered it takes time to reinitialize the aggregation structures and to
// unwind existing aggregation waits and meanwhile, an early SGAGGREGATE message could sneak through. This clumsy
// mechanism will wait as much as 10 secs for that to finish.
Vsync.Sleep(1000);
}
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
thevid = this.theView.viewid;
}
}
while (vid > thevid && lcnt < 10);
if (vid > thevid)
{
throw new VsyncException("SAGGREGATE: received a p2p message for view " + vid + " but I seem to be stuck in view " + thevid);
}
if ((VsyncSystem.Debug & VsyncSystem.AGGREGATION) != 0)
{
Vsync.WriteLine("SGAGGREGATE from " + from + ": level=" + level + ", " + (fromBelow ? "DValue" : "RValue") + ", vid=" + vid + " (current vid " + thevid + "), key=" + key + ", value=" + value);
}
IAggregateEventHandler callme = null;
using (var tmpLockObj = new LockAndElevate(this.AggListLock))
{
if (level > this.AggList.Length)
{
return;
}
foreach (IAggregateEventHandler iae in this.AggList[level])
{
if (iae.GetKeyType() == key.GetType() && iae.GetValueType() == value.GetType())
{
callme = iae;
break;
}
}
}
if (callme != null)
{
callme.GotSGAggInfo(fromBelow, level, vid, key, value);
}
else if ((VsyncSystem.Debug & VsyncSystem.AGGREGATION) != 0)
{
Vsync.WriteLine("SGAGGREGATE: WARNING.... couldn't find the callback handler");
}
}));
this.RegisterMakeChkpt(view =>
{
if (!this.LockingInUse)
{
return;
}
string[] lnames, wnames = null;
LockInfo[] linfo = null;
LockReq[] reqs = null;
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
lnames = this.LocksList.Keys.ToArray();
if (lnames != null)
{
linfo = this.LocksList.Values.ToArray();
int wcnt = 0;
foreach (LockInfo li in linfo)
{
wcnt += li.wantLock.Count;
}
wnames = new string[wcnt];
reqs = new LockReq[wcnt];
wcnt = 0;
foreach (LockInfo li in linfo)
{
foreach (LockReq a in li.wantLock)
{
wnames[wcnt] = li.name;
reqs[wcnt++] = a;
}
}
}
}
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("Sending lock state<" + this.gname + ">:" + Environment.NewLine + this.GetLockState());
}
if (lnames != null)
{
this.SendChkpt(lnames, linfo, wnames, reqs);
}
});
this.RegisterLoadChkpt(new Action<string[], LockInfo[], string[], LockReq[]>((keys, linfo, names, reqs) =>
{
this.LockingInUse = true;
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
for (int n = 0; n < keys.Length; n++)
{
this.LocksList[keys[n]] = linfo[n];
}
for (int n = 0; n < names.Length; n++)
{
this.LocksList[names[n]].wantLock.Add(reqs[n]);
}
}
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("After loading lock state<" + this.gname + ">:" + Environment.NewLine + this.GetLockState());
}
}));
// This client-of-a-group to group-representative relaying logic
this.doRegister(Vsync.RELAYSEND, new Action<Msg>(m =>
{
if ((VsyncSystem.Debug & VsyncSystem.RELAYLOGIC) != 0)
{
Vsync.WriteLine("Group owner relaying multicast(" + m + ")");
}
this.doSend(false, false, Vsync.RELAYSEND, Vsync.my_address, new Msg(m));
}));
this.doRegister(Vsync.RELAYSEND, new Action<Address, Msg>((leader, m) =>
{
// Runs in all the members
if ((VsyncSystem.Debug & VsyncSystem.RELAYLOGIC) != 0)
{
Vsync.WriteLine("Large Group owner delivering relayed multicast(" + m + ")");
}
List<Msg> newRelayedLGSends = new List<Msg>();
using (var tmpLockObj = new LockAndElevate(this.RelayedLGSendsLock))
{
foreach (Msg rm in this.RelayedLGSends)
{
if (rm != m && rm.vid != -1 && rm.msgid != -1)
{
newRelayedLGSends.Add(rm);
}
}
this.RelayedLGSends = newRelayedLGSends;
}
if ((VsyncSystem.Debug & VsyncSystem.DELIVERY) != 0)
{
Vsync.WriteLine("<" + this.gname + ">: UnpackAndDeliver delivering " + m);
}
this.doDeliveryCallbacks(m, "from UnpackAndDeliver (for relayed large-group sends)", Msg.MULTICAST);
}));
if ((this.flags & G_ISLARGE) != 0)
{
this.RegisterViewHandler(view =>
{
tokenInfo theToken;
using (var tmpLockObj = new LockAndElevate(this.TokenLock))
{
theToken = this.theToken;
}
if (theToken != null)
{
if (this.prevLGOwner != null && theToken.groupOwner == this.prevLGOwner)
{
return;
}
this.prevLGOwner = theToken.groupOwner;
List<Msg> LGResendList = new List<Msg>();
using (var tmpLockObj = new LockAndElevate(this.RelayedLGSendsLock))
{
foreach (Msg rm in this.RelayedLGSends)
{
LGResendList.Add(rm);
}
}
foreach (Msg m in LGResendList)
{
this.P2PSend(theToken.groupOwner, Vsync.RELAYSEND, m);
}
}
});
}
this.ViewHandlers += this.LockNewView;
this.groupP2PReaderThread = new Thread(() =>
{
try
{
this.isP2PThread = true;
this.doDelivery(this.incomingP2P, "incomingP2P");
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "<" + this.gname + "> incoming p2p delivery thread", IsBackground = true };
this.groupP2PReaderThread.Start();
this.groupIPMCReaderThread = new Thread(() =>
{
try
{
this.doDelivery(this.incomingSends, "incomingSends");
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "<" + this.gname + "> incoming multicasts delivery thread", IsBackground = true };
this.groupIPMCReaderThread.Start();
}
private void AcceptInitialView(Address sender, View theView, int[] mm, int initialSeqn)
{
if (this.HasFirstView)
{
return;
}
VsyncSystem.VsyncRestarting = false;
if ((this.flags & G_ISLARGE) != 0)
{
theView.NextIncomingMsgID[1] = initialSeqn;
}
if (this == Vsync.VSYNCMEMBERS)
{
IPMCNewView(this.gaddr, theView);
}
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("Received a new single-group INITIALVIEW event in group " + this.gname + " from " + sender + ", View " + theView + ", with initialSeqn = " + initialSeqn);
}
if (Vsync.WORKER_MODE)
{
this.TypeSig = TypeSignature(this);
this.GroupOpen = true;
}
if (theView.viewid == 0)
{
if (this.myCheckpointFile != null && this.CheckPointFileExists())
{
this.LoadCheckpointFromFile();
}
else if (this.theInitializer != null && !this.initializationDone)
{
this.InitializeGroup(theView);
}
using (var tmpLockObj = new LockAndElevate(this.GroupFlagsLock))
{
if ((this.flags & G_NEEDSTATEXFER) != 0)
{
this.xferWait.Release();
}
this.flags &= ~G_NEEDSTATEXFER;
}
}
if (!theView.joiners.Contains(Vsync.my_address))
{
theView.joiners = new[] { Vsync.my_address };
}
this.NewView(theView, "single-view initializer", mm);
this.TPGroupsLearnedMM(mm);
}
private void opqDrain()
{
this.opqDeliver(Vsync.NULLADDRESS, 0, 0, 0, Vsync.NULLADDRESS, true);
using (var tmpLockObj = new LockAndElevate(this.OrderedSubsetListLock))
{
this.OrderedSubsetPQ = new SortedList<osspq, Msg>();
this.OrderedSubsetPQCount = 0;
}
using (var tmpLockObj = new LockAndElevate(this.RecentOpqNodesLock))
{
this.RecentOpqNodes = new List<osspq>();
}
}
private void opqDeliver(Address sentBy, int vid, int msgid, long cts, Address ctwho, bool drainOPQ = false)
{
osspq opq = null;
Msg m = null;
this.myTS = Math.Max(this.myTS, cts);
List<Msg> msgs = new List<Msg>();
List<osspq> keys = new List<osspq>();
using (var tmpLockObj = new LockAndElevate(this.OrderedSubsetListLock))
{
if (!sentBy.isNull())
{
foreach (KeyValuePair<osspq, Msg> kvp in this.OrderedSubsetPQ)
{
if (kvp.Value.sender == sentBy && kvp.Value.vid == vid && kvp.Value.msgid == msgid)
{
opq = kvp.Key;
m = kvp.Value;
break;
}
}
if (opq != null)
{
this.OrderedSubsetPQ.Remove(opq);
--this.OrderedSubsetPQCount;
opq.commit(cts, ctwho);
if (!this.OrderedSubsetPQ.ContainsKey(opq))
{
this.OrderedSubsetPQ.Add(opq, m);
this.OrderedSubsetPQCount++;
}
}
else if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("opqDeliver CAN'T FIND (" + sentBy + "::" + vid + ":" + msgid + ", cts=" + cts + "::" + ctwho + ")");
}
}
while (this.OrderedSubsetPQ.Count > 0 && (drainOPQ || this.OrderedSubsetPQ.ElementAt(0).Key.cflag))
{
KeyValuePair<osspq, Msg> first = this.OrderedSubsetPQ.ElementAt(0);
this.OrderedSubsetPQ.Remove(first.Key);
this.OrderedSubsetPQCount--;
if (first.Key.cflag && first.Value != null)
{
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("ORDERED SUBSET CAST: " + (drainOPQ ? "[DRAIN during flush] " : string.Empty) + "Delivery of " + first.Value);
}
first.Value.type = Msg.REDELIVERY;
msgs.Add(first.Value);
}
else if ((VsyncSystem.Debug & (VsyncSystem.ORDEREDSEND | VsyncSystem.DISCARDS)) != 0)
{
Vsync.WriteLine("ORDERED SUBSET CAST: " + (drainOPQ ? "[DISCARD during flush] " : string.Empty) + "Delivery of " + first.Value);
}
if (!drainOPQ)
{
keys.Add(first.Key);
}
}
}
foreach (Msg msg in msgs)
{
this.incomingP2P.put(msg);
}
using (var tmpLockObj = new LockAndElevate(this.RecentOpqNodesLock))
{
foreach (osspq key in keys)
{
this.RecentOpqNodes.Add(key);
}
}
}
private void InitializeGroup(View theView)
{
if (this.initializationDone)
{
return;
}
this.initializationDone = true;
if (this.theInitializer != null)
{
this.theInitializer();
}
if (this.myCheckpointFile != null && theView.IAmLeader())
{
this.MakeCheckpoint(theView);
}
}
private string OrderToString(Address[] senders, int[] vids, int[] msgids)
{
string s = "{ ";
for (int n = 0; n < senders.Length; n++)
{
s += senders[n] + "::" + vids[n] + ":" + msgids[n] + " ";
}
return s + "}";
}
// Caller must hold the SSLock
private void deliverSSItems()
{
List<Msg> toDeliver = new List<Msg>();
KeyValuePair<SUTW, Msg> ssItem;
while ((ssItem = this.SSList.ElementAtOrDefault(0)).Value != null && ssItem.Key.commitFlag)
{
if (!this.SSList.Remove(ssItem.Key))
{
throw new VsyncException("SSList.Remove failed");
}
if (this.durabilityMethod != null)
{
this.durabilityMethod.SetOrder(ssItem.Value);
}
Msg m = ssItem.Value;
m.type = Msg.REDELIVERY;
toDeliver.Add(m);
if ((VsyncSystem.Debug & VsyncSystem.SAFESEND) != 0)
{
Vsync.WriteLine("deliverSSItems: delivering " + m.sender + "::" + m.vid + ":" + m.msgid);
}
if (this.incomingSends.LastPushedVID != 0 && m.vid != this.incomingSends.LastPushedVID)
{
throw new VsyncException("Msg.REDELIVERY/SAFESEND pushing " + m.sender + "::" + m.vid + ":" + m.msgid + " while incomingSends.viewid=" + this.incomingSends.LastPushedVID);
}
}
this.incomingSends.putFront(toDeliver);
}
internal void CheckRetainedOpqINodes()
{
List<osspq> newropq = new List<osspq>();
using (var tmpLockObj = new LockAndElevate(this.RecentOpqNodesLock))
{
foreach (osspq opq in this.RecentOpqNodes)
{
if (opq.gtime == 0)
{
// This worries me; better would be a more explicit garbage collection based on
// tracking the completion of pending sends of the ORDEREDSEND commit times and then
// disseminating that information using the same background gossip mechanisms that
// we use to check rates. In fact this is probably pretty safe but as a purist,
// it bugs me that coded in this manner, the system would potentially violate its
// properties if a crash disrupts sending of the ORDEREDSEND commits, and then the
// associated flush doesn't start "soon enough". A further issue is that the list
// itself could get long.
opq.gtime = Vsync.NOW + 60000;
newropq.Add(opq);
}
else if (Vsync.NOW < opq.gtime)
{
newropq.Add(opq);
}
}
this.RecentOpqNodes = newropq;
}
}
private void EndStateXfer()
{
using (var tmpLockObj = new LockAndElevate(this.GroupFlagsLock))
{
if ((this.flags & G_NEEDSTATEXFER) != 0)
{
this.xferWait.Release();
}
this.flags &= ~G_NEEDSTATEXFER;
}
this.ReplayToDo();
}
internal static void GroupNoteFailure(Group g, Address which)
{
int r = -1;
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
if (g.theView != null)
{
r = g.theView.GetRankOf(which);
}
}
if (r != -1)
{
bool doCB = false;
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
if (!Vsync.RIPList.Contains(which))
{
Vsync.RIPList.Add(which);
doCB = true;
}
}
if (doCB)
{
ILock.NoteFailed(g.gaddr, which);
ReliableSender.P2PSequencer.remoteFailed(which);
ReliableSender.AckNoteFailure(which);
View.noteFailed(g, which);
AwaitReplies.doNoteFailure(which);
Group.deFragNoteFailure(which);
}
}
}
// VTa happens before VTb iff
// For all i: VTa[i] <= VTb[i]
// For some i: VTa[i] < VTb[i]
internal bool happensBefore(int[] VTa, int[] VTb)
{
bool rv = this._happensBefore(VTa, VTb);
if ((VsyncSystem.Debug & VsyncSystem.CAUSALDELIVERY) != 0)
{
string vta = VTtoString(VTa);
string vtb = VTtoString(VTb);
Vsync.WriteLine("happensBefore(" + vta + ", " + vtb + ")=" + rv);
}
return rv;
}
private static string VTtoString(int[] VT)
{
string vt = "[ ";
foreach (int i in VT)
{
vt += i.ToString("D3") + " ";
}
return vt + "]";
}
internal bool _happensBefore(int[] VTa, int[] VTb)
{
bool res = false;
if (VTa.Length != VTb.Length)
{
throw new VsyncException("happensBefore: VT lengths don't match");
}
for (int i = 0; i < VTa.Length; i++)
{
if (VTa[i] > VTb[i])
{
return false;
}
if (VTa[i] < VTb[i])
{
res = true;
}
}
return res;
}
// Incoming causal send is deliverable if next from this sender
// and all causally prior messages have been delivered
internal bool isDeliverable(int senderRank, int[] VT)
{
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
if (VT.Length != this.theView.myVT.Length || (senderRank < 0 || senderRank >= this.theView.myVT.Length))
{
throw new VsyncException("causal send: isDeliverable fault");
}
// Optimistically assume delivery will be possible
++this.theView.myVT[senderRank];
for (int i = 0; i < VT.Length; i++)
{
if (VT[i] > this.theView.myVT[i])
{
// No luck, back out of optimistic pre-increment
--this.theView.myVT[senderRank];
return false;
}
}
}
return true;
}
internal void PrepareForTermination()
{
using (var tmpLockObj = new LockAndElevate(this.quiesceLock))
{
if ((this.flags & G_TERMINATING) != 0)
{
return;
}
this.flags |= G_TERMINATING;
}
this.interruptLockWaits.Set();
AwaitReplies.InterruptReplyWaits(this);
this.InterruptAggregationWaits();
while (this.quiesceCnt-- > 0)
{
ILock.NoteThreadState("quiesceWait.WaitOne()");
this.quiesceWait.WaitOne();
ILock.NoteThreadState(null);
}
}
private bool warnedHim;
internal bool VsyncCallStart()
{
if (Thread.CurrentThread != VsyncSystem.ParentThread && VsyncSystem.ParentThread != null)
{
Thread.CurrentThread.IsBackground = true;
}
using (var tmpLockObj = new LockAndElevate(this.quiesceLock))
{
if ((this.flags & G_TERMINATING) != 0)
{
if (!this.warnedHim)
{
Vsync.WriteLine("WARNING: <" + this.gname + "> was closing/terminating but user-code attempted to issue Vsync system calls on it");
}
this.warnedHim = true;
return false;
}
++this.quiesceCnt;
}
return true;
}
internal void VsyncCallDone()
{
using (var tmpLockObj = new LockAndElevate(this.quiesceLock))
{
if ((this.flags & G_TERMINATING) != 0)
{
this.quiesceWait.Release();
}
else
{
--this.quiesceCnt;
}
}
}
internal void GroupClose()
{
if (this.isTrackingProxy)
{
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
TPGroups.Remove(this.gaddr);
}
return;
}
if (VsyncSystem.VsyncActive && (this.gname.Equals("VSYNCMEMBERS", StringComparison.Ordinal) || !this.GroupOpen))
{
return;
}
using (var tmpLockObj = new LockAndElevate(ReliableSender.PendingSendBufferLock))
{
this.GroupOpen = false;
this.WasOpen = true;
}
if (this.durabilityMethod != null)
{
this.durabilityMethod.Shutdown();
}
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
VsyncGroups.Remove(this.gaddr);
}
if (this.theView != null)
{
this.theView.viewid++;
this.theView.leavers = this.theView.members;
this.theView.members = new Address[0];
this.theView.joiners = new Address[0];
// This will cause IPMC delivery thread to deliver a final view and then terminate itself
if (this.incomingSends != null)
{
this.incomingSends.put(this.theView);
this.incomingSends.put(null);
}
if (this.incomingP2P != null)
{
this.incomingP2P.put(this.theView);
this.incomingP2P.put(null);
}
}
if (!this.gname.Equals("ORACLE", StringComparison.Ordinal))
{
using (var tmpLockObj = new LockAndElevate(Group.GroupRIPLock))
{
if (!Group.GroupRIPList.Contains(this.gaddr))
{
Group.GroupRIPList.Add(this.gaddr);
Vsync.OnTimer(Vsync.VSYNC_DEFAULTTIMEOUT * ((this.flags & G_ISLARGE) != 0 ? 10 : 2), () =>
{
using (var tmpLockObj1 = new LockAndElevate(Group.GroupRIPLock))
{
Group.GroupRIPList.Remove(this.gaddr);
}
});
}
}
}
if ((this.flags & G_ISLARGE) != 0)
{
ReliableSender.lgPendingSendCleanup(this);
}
else
{
ReliableSender.PendingSendCleanup(this, null);
}
this.xferWait.Release();
Semaphore s = this.CPSSema;
if (s != null)
{
s.Release(1000);
}
MCMDSocket.UnMap(new[] { this.gaddr });
if (this.theView != null && this.incomingSends != null && this.groupIPMCReaderThread != Thread.CurrentThread)
{
this.groupIPMCReaderThread.Join();
}
if (this.theView != null && this.incomingP2P != null && this.groupP2PReaderThread != Thread.CurrentThread)
{
this.groupP2PReaderThread.Join();
}
fiCleanup(this.gaddr);
}
// Locking package
internal int LockPackageConfig; // Overall policy control for the package
internal const int LOCK_INTERNAL = 0; // Used purely "internally" within a group
internal const int LOCK_EPHEMERAL_EXTERN = 1;
// Used with some form of "external" resource but state not recovered (ephemeral) after a total failure
internal const int LOCK_RECOVER_EXTERN = 2;
// Used with an external resource and must be recovered after group crashes, then restarts
internal const int LOCK_RELEASE = 0; // Lock is released if the member holding it exits.
internal const int LOCK_TRANSFER = 1;
// By default, if a member holding a lock crashes, the lock is retained and ownership transfers to the rank-zero member
internal const int LOCKIT = 0; // Request to obtain a lock
internal const int RELEASEIT = 1; // Release it to a new holder
internal const int CANCEL = 2; // Cancel a pending request
internal const int UNLOCK = 0;
internal const int WRITELOCK = 1;
internal const int READLOCK = 2;
internal static string[] ltype = { "unlocked", "writelock", "readlock" };
internal LockBroken DefaultCallback;
/// <exclude></exclude>
#if PROTOCOL_BUFFERS
[ProtoContract(SkipConstructor = true)]
#else
[AutoMarshalled]
#endif
public class LockReq
{
/// <exclude></exclude>
[ProtoMember(1)]
public Address who;
/// <exclude></exclude>
[ProtoMember(2)]
public int how;
#if !PROTOCOL_BUFFERS
/// <exclude></exclude>
public LockReq()
{
}
#endif
/// <exclude></exclude>
public LockReq(Address who, int how)
{
this.who = who;
this.how = how;
}
/// <exclude></exclude>
public override string ToString()
{
return this.who + "::" + ltype[this.how];
}
}
/// <exclude></exclude>
internal sealed class LockInfo : IDisposable, ISelfMarshalled
{
internal string name;
internal bool islocked = false;
internal int policy;
internal int locktype;
// If the locktype is WRITE there will be exactly one holder
// For locktype READ there could be a list
internal List<Address> holders = new List<Address>();
internal bool requestPending = false;
internal LockBroken Notify;
internal List<LockReq> wantLock = new List<LockReq>();
internal Semaphore wait = new Semaphore(0, int.MaxValue);
public byte[] toBArray()
{
return Msg.toBArray(this.islocked, this.policy, this.holders, this.locktype);
}
public LockInfo(byte[] ba)
{
object[] objs = Msg.BArrayToObjects(ba, typeof(bool), typeof(int), typeof(List<Address>), typeof(int));
int idx = 0;
this.islocked = (bool)objs[idx++];
this.policy = (int)objs[idx++];
this.holders = (List<Address>)objs[idx++];
this.locktype = (int)objs[idx];
}
internal LockInfo(string name)
{
this.name = name;
this.locktype = UNLOCK;
}
/// <summary>
/// To dispose of wait semaphore
/// </summary>
public void Dispose()
{
this.Dispose(true);
}
private bool disposed;
private void Dispose(bool disposing)
{
lock (this)
{
if (this.disposed)
{
return;
}
this.disposed = true;
}
if (this.wait != null)
{
this.wait.Dispose();
}
}
}
internal Dictionary<string, LockInfo> LocksList = new Dictionary<string, LockInfo>(100);
internal LockObject LocksListLock = new LockObject("LocksListLock");
internal delegate void LockDel(int action, string lockName, Address who, int counter);
internal bool LockingInUse = false;
internal int lockCurrentViewid;
internal void LockNewView(View v)
{
this.lockCurrentViewid = v.viewid;
if (v.leavers.Length == 0 || !this.LockingInUse || v.members.Length == 0)
{
return;
}
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("LockNewView " + v);
}
List<LockInfo> toGrant = new List<LockInfo>();
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
foreach (KeyValuePair<string, LockInfo> kvp in this.LocksList)
{
LockInfo li = kvp.Value;
List<LockReq> newWantLock = new List<LockReq>();
foreach (LockReq lr in li.wantLock)
{
if (v.GetRankOf(lr.who) != -1)
{
newWantLock.Add(lr);
}
}
li.wantLock = newWantLock;
if (li.islocked && li.locktype == READLOCK)
{
// Readlocks always break if the holder fails
List<Address> newHolders = new List<Address>();
foreach (Address who in li.holders)
{
if (v.GetRankOf(who) != -1)
{
newHolders.Add(who);
}
}
li.holders = newHolders;
li.islocked = li.holders.Count > 0;
}
else if (li.islocked && li.locktype == WRITELOCK)
{
// For Writelocks the policy determines the action to take
if (v.GetRankOf(li.holders[0]) == -1)
{
switch (li.policy)
{
case LOCK_TRANSFER:
LockBroken lb = li.Notify ?? this.DefaultCallback;
if (lb != null)
{
lb(LOCK_TRANSFER, li.name, li.holders[0]);
}
li.holders = new List<Address> { v.members[0] };
continue;
case LOCK_RELEASE:
lb = li.Notify ?? this.DefaultCallback;
if (lb != null)
{
lb(LOCK_RELEASE, li.name, li.holders[0]);
}
li.holders = new List<Address>();
li.islocked = false;
if (li.wantLock.Count > 0)
{
toGrant.Add(li);
}
continue;
}
}
}
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("After NEWVIEW state=" + Environment.NewLine + this.GetLockState());
}
foreach (LockInfo tg in toGrant)
{
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("After NEWVIEW GrantLock " + tg.name);
}
if (tg.wantLock.Count > 0)
{
this.GrantLock("Name not known", tg);
}
}
}
}
}
/// <summary>
/// Obtains a lock within a group on a resource, given a string naming that resource
/// </summary>
/// <param name="lockName">Name of the lock; any string will do</param>
/// <returns>True if successful, false for failure</returns>
/// <remarks>Lock(x) can fail in two ways: by timing out, or (if LOCK_EXTERN_EPHEMERAL was specified) in the event
/// that the group being used drops below the SafeSendThreshold, which causes SafeSend to fail.</remarks>
public bool Lock(string lockName)
{
return this.Lock(lockName, int.MaxValue, WRITELOCK);
}
/// <summary>
/// Obtains a lock within a group on a resource, given a string naming that resource
/// </summary>
/// <param name="lockName">Name of the lock; any string will do</param>
/// <returns>True if successful, false for failure</returns>
/// <remarks>Lock(x) can fail in two ways: by timing out, or (if LOCK_EXTERN_EPHEMERAL was specified) in the event
/// that the group being used drops below the SafeSendThreshold, which causes SafeSend to fail.</remarks>
public bool ReadLock(string lockName)
{
return this.Lock(lockName, int.MaxValue, READLOCK);
}
/// <summary>
/// Obtains a lock within a group on a resource, given a string naming that resource
/// </summary>
/// <param name="lockName">Name of the lock; any string will do</param>
/// <returns>True if successful, false for failure</returns>
/// <remarks>Lock(x) can fail in two ways: by timing out, or (if LOCK_EXTERN_EPHEMERAL was specified) in the event
/// that the group being used drops below the SafeSendThreshold, which causes SafeSend to fail.</remarks>
public bool WriteLock(string lockName)
{
return this.Lock(lockName, int.MaxValue, WRITELOCK);
}
/// <summary>
/// Obtains a lock; gives up if the timeout expires
/// </summary>
/// <param name="lockName">Name of the lock; any string will do</param>
/// <param name="timeout">Timeout in ms</param>
/// <returns>True if successful, false for failure</returns>
/// <remarks>Lock(x) can fail in two ways: by timing out, or (if LOCK_EXTERN_EPHEMERAL was specified) in the event
/// that the group being used drops below the SafeSendThreshold, which causes SafeSend to fail.</remarks>
public bool Lock(string lockName, int timeout)
{
return this.Lock(lockName, timeout, WRITELOCK);
}
/// <summary>
/// Obtains a lock; gives up if the timeout expires
/// </summary>
/// <param name="lockName">Name of the lock; any string will do</param>
/// <param name="timeout">Timeout in ms</param>
/// <returns>True if successful, false for failure</returns>
/// <remarks>Lock(x) can fail in two ways: by timing out, or (if LOCK_EXTERN_EPHEMERAL was specified) in the event
/// that the group being used drops below the SafeSendThreshold, which causes SafeSend to fail.</remarks>
public bool ReadLock(string lockName, int timeout)
{
return this.Lock(lockName, timeout, READLOCK);
}
/// <summary>
/// Obtains a lock; gives up if the timeout expires
/// </summary>
/// <param name="lockName">Name of the lock; any string will do</param>
/// <param name="timeout">Timeout in ms</param>
/// <returns>True if successful, false for failure</returns>
/// <remarks>Lock(x) can fail in two ways: by timing out, or (if LOCK_EXTERN_EPHEMERAL was specified) in the event
/// that the group being used drops below the SafeSendThreshold, which causes SafeSend to fail.</remarks>
public bool WriteLock(string lockName, int timeout)
{
return this.Lock(lockName, timeout, WRITELOCK);
}
internal bool Lock(string lockName, int timeout, int locktype)
{
if (!this.VsyncCallStart())
{
return false;
}
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("Lock[" + this.gname + "](" + lockName + "::" + ltype[locktype] + "; timeout=" + ((timeout == int.MaxValue) ? "*" : timeout.ToString(CultureInfo.InvariantCulture)) + ")" + Environment.NewLine + this.GetLockState());
}
if ((this.flags & G_ISLARGE) != 0)
{
throw new VsyncException("Locking: Not support in large groups");
}
if ((this.flags & G_TERMINATING) != 0)
{
return false;
}
this.LockingInUse = true;
int tid = -1;
LockInfo lstate;
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
if (!this.LocksList.TryGetValue(lockName, out lstate))
{
lstate = new LockInfo(lockName);
this.LocksList[lockName] = lstate;
}
if (!lstate.requestPending)
{
lstate.requestPending = true;
}
else
{
throw new VsyncException("g.Lock(" + lockName + "): Reentrancy not currently supported");
}
}
bool LockSendFailed = false;
try
{
this.LockSend(LOCKIT, lockName, Vsync.my_address, locktype);
}
catch (VsyncSafeSendException)
{
// Occurs in a group where SafeSendThreshold isn't set to ALL and where
// LOCK_EXTERN_EPHEMERAL was specified if the group size drops below SafeSendThreshold
LockSendFailed = true;
}
if (timeout < int.MaxValue)
{
tid = Vsync.OnTimerThread(Math.Max(100, timeout), () => this.LockSend(CANCEL, lockName, Vsync.my_address, locktype));
}
if (!LockSendFailed)
{
// Wait for the lock, but give up if the group starts to Terminate.
if (WaitHandle.WaitAny(new WaitHandle[] { lstate.wait, interruptLockWaits }) == 1)
{
this.VsyncCallDone();
return false;
}
}
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("After Lock(" + lockName + "): got-lock=" + (lstate.islocked && lstate.holders.Contains(Vsync.my_address)) + ":" + Environment.NewLine + this.GetLockState());
}
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
if (tid != -1)
{
Vsync.TimerCancel(tid);
}
lstate.requestPending = false;
this.VsyncCallDone();
return lstate.islocked && lstate.holders.Contains(Vsync.my_address);
}
}
/// <summary>
/// Releases a lock
/// </summary>
/// <param name="lockName">Name of the lock to be released</param>
public void Unlock(string lockName)
{
if (!this.VsyncCallStart())
{
return;
}
if ((this.flags & G_ISLARGE) != 0)
{
throw new VsyncException("Locking: Not support in large groups");
}
if ((this.flags & G_TERMINATING) != 0)
{
return;
}
this.LockingInUse = true;
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("UnLock[" + this.gname + "](" + lockName + ")");
}
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
LockInfo lstate;
if (!this.LocksList.TryGetValue(lockName, out lstate) || !lstate.islocked)
{
throw new VsyncException("g.Unlock(" + lockName + "), but it wasn't locked in:" + Environment.NewLine + this.GetLockState());
}
}
this.LockSend(RELEASEIT, lockName, Vsync.my_address, UNLOCK);
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("After Unlock(" + lockName + "):" + Environment.NewLine + this.GetLockState());
}
this.VsyncCallDone();
}
private static int COUNTER;
internal void LockSend(int command, string lname, Address who, int locktype)
{
COUNTER++;
switch (this.LockPackageConfig)
{
case LOCK_INTERNAL:
this.OrderedSend(Vsync.LOCKREQ, command, lname, who, (Vsync.my_pid * 10000) + COUNTER, locktype);
return;
case LOCK_EPHEMERAL_EXTERN:
this.SafeSend(Vsync.LOCKREQ, command, lname, who, (Vsync.my_pid * 10000) + COUNTER, locktype);
return;
case LOCK_RECOVER_EXTERN:
this.SafeSend(Vsync.LOCKREQ, command, lname, who, (Vsync.my_pid * 10000) + COUNTER, locktype);
return;
}
}
internal void GrantLock(string lockName, LockInfo lstate)
{
string toWhom;
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
do
{
LockReq lr = lstate.wantLock.First();
if (!lstate.wantLock.Remove(lr))
{
throw new VsyncException("GrantLock: unable to remove " + lr);
}
lstate.islocked = true;
lstate.locktype = lr.how;
lstate.holders.Add(lr.who);
if (lr.who.isMyAddress())
{
toWhom = " GRANTED TO ME!";
lstate.wait.Release();
}
else
{
toWhom = " to " + lr;
}
}
while (lstate.locktype == READLOCK && lstate.wantLock.Count() > 0 && lstate.wantLock.First().how == READLOCK);
}
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("-------LOCK " + lockName + toWhom);
}
if (this.LockPackageConfig == LOCK_RECOVER_EXTERN)
{
this.MakeCheckpoint(this.theView);
}
}
internal void CancelLockRequest(string LockName, Address who, LockInfo lstate)
{
if ((VsyncSystem.Debug & VsyncSystem.LOCKS) != 0)
{
Vsync.WriteLine("-------CANCEL " + LockName + " request by " + who);
}
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
foreach (LockReq wl in lstate.wantLock)
{
if (wl.who == who)
{
if (!lstate.wantLock.Remove(wl))
{
throw new VsyncException("CancelLockRequest: unable to remove " + who);
}
break;
}
}
if (who.isMyAddress())
{
lstate.wait.Release();
}
}
if (this.LockPackageConfig == LOCK_RECOVER_EXTERN)
{
this.MakeCheckpoint(this.theView);
}
}
/// <summary>
/// Given the name of a lock, returns the current holders if the lock is locked. Returns null if unlocked.
/// There will be at most one holder if locked for writing, but could be multiple holders if locked for reading
/// </summary>
/// <param name="lockName"></param>
/// <returns>Addresses of the current holders, if any</returns>
public List<Address> Holder(string lockName)
{
if ((this.flags & G_ISLARGE) != 0)
{
throw new VsyncException("Locking: Not support in large groups");
}
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
LockInfo state;
if (this.LocksList.TryGetValue(lockName, out state) && state.islocked)
{
return state.holders;
}
}
return null;
}
internal string GetLockState()
{
int cnt = 0;
string s = string.Empty;
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
foreach (KeyValuePair<string, LockInfo> kvp in this.LocksList)
{
string llist = string.Empty;
if (kvp.Value.wantLock != null)
{
foreach (LockReq wl in kvp.Value.wantLock)
{
llist += wl + " ";
}
}
s += " USER-DEFINED LOCK: {" + kvp.Key + "::" + (kvp.Value.islocked ?
(ltype[kvp.Value.locktype] + " by " + Address.VectorToString(kvp.Value.holders.ToArray())) : "Not locked") +
(llist.Length > 0 ? (", wanted-by: " + llist) : ", no waiting requests") + "}" + Environment.NewLine;
++cnt;
}
}
return cnt > 0 ? s : string.Empty;
}
/// <summary>
/// Specifies the desired handling for write locks in the event that a crash occurs while someone is holding the lock.
/// Note that read locks are not affected by the policy and are always released if a crash occurs while someone holds the lock
/// </summary>
/// <param name="lockName">Name of the lock</param>
/// <param name="lockPolicy">LOCK_TRANSFER transfers write locks to the rank0 member, LOCK_RELEASE releases write locks </param>
/// <remarks>It doesn't make sense to use the LOCK_RELEASE policy if the service as a whole is running in the slow,
/// very conservative, LOCK_RECOVER_EXTERN mode. </remarks>
public void SetLockPolicy(string lockName, int lockPolicy)
{
this.SetLockPolicy(lockName, lockPolicy, null);
}
/// <summary>
/// Specifies the desired handling for write locks in the event that a crash occurs while someone is holding the lock.
/// Note that read locks are not affected by the policy and are always released if a crash occurs while someone holds the lock
/// </summary>
/// <param name="lockName">Lock name</param>
/// <param name="lockPolicy">LOCK_TRANSFER transfers write locks to the rank0 member, LOCK_RELEASE releases write locks </param>
/// <param name="del">Method to notify in the event of an event</param>
/// <remarks>It doesn't make sense to use the LOCK_RELEASE policy if the service as a whole is running in the slow,
/// very conservative, LOCK_RECOVER_EXTERN mode. </remarks>
public void SetLockPolicy(string lockName, int lockPolicy, LockBroken del)
{
if ((this.flags & G_ISLARGE) != 0)
{
throw new VsyncException("Locking: Not support in large groups");
}
this.LockingInUse = true;
if (lockPolicy == LOCK_RELEASE && this.LockPackageConfig == LOCK_RECOVER_EXTERN)
{
Vsync.WriteLine("WARNING: Lock policy inconsistency: LOCK_RELEASE for lock " + lockName + " in LOCK_RECOVER_EXTERN mode!");
}
using (var tmpLockObj = new LockAndElevate(this.LocksListLock))
{
LockInfo lstate;
if (!this.LocksList.TryGetValue(lockName, out lstate))
{
lstate = new LockInfo(lockName);
this.LocksList[lockName] = lstate;
}
lstate.policy = lockPolicy;
lstate.Notify = del;
}
}
/// <summary>
/// Parameters controlling the overall package
/// </summary>
/// <param name="config">LOCK_INTERNAL, LOCK_EPHEMERAL_EXTERN, LOCK_RECOVER_EXTERN</param>
/// <remarks>The SetLockPolicies control API allows the lock package user to indicate to Vsync how the locking system is being
/// used. With configuration parameter LOCK_INTERN, locking is understood to run on a "soft state" resource with purely in-memory
/// state. This is fastest, but is safe only if you have good reason to know that your group won't need to tolerate failures
/// in which all members fail. We recommend it for groups that have some "internal" synchronization need, but there may be
/// cloud settings in which this is a safe choice based in information you might have about the likelihood of a total failure.
///
/// With the LOCK_EPHEMERAL_EXTERN policy, the locking package is configured to deal with an external resource but still forgets
/// the lock state in the event of a total failure. This is slower than LOCK_INTERNAL but not dramatically so, and has the advantage
/// that it can safely be used if the goal is to lock access to, say, a printer. Locks are released if the service crashes but if you
/// know that the external resource won't be accessed during those periods, you may be safe with this choice.
///
/// The slowest but safest option is LOCK_REOVER_EXTERN. Here we are pretty obssessive about logging state and can restore the service
/// after any possible failure with the same lock state it was in prior to the failure. Costs are high, and this makes sense only with the
/// LOCK_TRANSFER option for handling of failures, since ALL the old members will have crashed. If you would use LOCK_RELEASE as a policy,
/// you definitely don't need LOCK_RECOVER_EXTERN as a configuration state. </remarks>
public void SetLockPolicies(int config)
{
this.SetLockPolicies(config, null);
}
/// <summary>
/// Parameters controlling the overall packageParameters controlling the overall package
/// </summary>
/// <param name="config">LOCK_INTERNAL, LOCK_EPHEMERAL_EXTERN, LOCK_RECOVER_EXTERN</param>
/// <param name="del">Default callback for lock transfer or broken events</param>
/// <remarks>The SetLockPolicies control API allows the lock package user to indicate to Vsync how the locking system is being
/// used. With configuration parameter LOCK_INTERN, locking is understood to run on a "soft state" resource with purely in-memory
/// state. This is fastest, but is safe only if you have good reason to know that your group won't need to tolerate failures
/// in which all members fail. We recommend it for groups that have some "internal" synchronization need, but there may be
/// cloud settings in which this is a safe choice based in information you might have about the likelihood of a total failure.
///
/// With the LOCK_EPHEMERAL_EXTERN policy, the locking package is configured to deal with an external resource but still forgets
/// the lock state in the event of a total failure. This is slower than LOCK_INTERNAL but not dramatically so, and has the advantage
/// that it can safely be used if the goal is to lock access to, say, a printer. Locks are released if the service crashes but if you
/// know that the external resource won't be accessed during those periods, you may be safe with this choice.
///
/// The slowest but safest option is LOCK_REOVER_EXTERN. Here we are pretty obssessive about logging state and can restore the service
/// after any possible failure with the same lock state it was in prior to the failure. Costs are high, and this makes sense only with the
/// LOCK_TRANSFER option for handling of failures, since ALL the old members will have crashed. If you would use LOCK_RELEASE as a policy,
/// you definitely don't need LOCK_RECOVER_EXTERN as a configuration state. </remarks>
public void SetLockPolicies(int config, LockBroken del)
{
if ((this.flags & G_ISLARGE) != 0)
{
throw new VsyncException("Locking: Not support in large groups");
}
this.LockingInUse = true;
this.LockPackageConfig = config;
this.DefaultCallback = del;
switch (config)
{
case LOCK_INTERNAL:
// This is the default
return;
case LOCK_EPHEMERAL_EXTERN:
// Forces use of SafeSend
return;
case LOCK_RECOVER_EXTERN:
// Uses SafeSend+Logging
if (this.safeSendDurabilityMethod == null)
{
throw new VsyncException("Recoverable mode for lock package requires SafeSend DiskDurabilityMode");
}
return;
default:
throw new VsyncException("Unrecognized lock policy configuration value: " + config);
}
}
/// <exclude></exclude>
#if PROTOCOL_BUFFERS
[ProtoContract(SkipConstructor = true)]
#else
[AutoMarshalled]
#endif
public class DHTItem
{
/// <exclude></exclude>
[ProtoMember(1)]
public readonly object value;
/// <exclude></exclude>
[ProtoMember(2)]
public readonly Address createdBy;
/// <exclude></exclude>
[ProtoMember(3)]
public readonly long createTimestamp;
#if !PROTOCOL_BUFFERS
/// <exclude></exclude>
public DHTItem()
{
}
#endif
internal DHTItem(object value)
{
this.value = value;
this.createdBy = Vsync.my_address;
this.createTimestamp = DateTime.UtcNow.Ticks;
}
internal DHTItem(DHTItem existingItem, object newValue)
{
this.value = newValue;
this.createdBy = existingItem.createdBy;
this.createTimestamp = existingItem.createTimestamp;
}
}
internal Dictionary<object, DHTItem> DHTContents = new Dictionary<object, DHTItem>(1000);
/// <summary>
/// Returns a clone containing only those (key,value) tuples that have the specified key and value types, or the full set if
/// the type is specified as "object". The result is a copy: changes to it won't be reflected into
/// the "original". which is maintained internally by Vsync and not directly accessible to the user.
/// <typeparam name="KT">Key type</typeparam>
/// <typeparam name="VT">Value type</typeparam>
/// </summary>
public IEnumerable<KeyValuePair<KT, VT>> DHT<KT, VT>()
{
List<KeyValuePair<KT, VT>> rval = new List<KeyValuePair<KT, VT>>();
if (!this.VsyncCallStart())
{
return rval;
}
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
foreach (KeyValuePair<object, DHTItem> kvp in this.DHTContents)
{
if (kvp.Key.GetType() == typeof(KT) && kvp.Value.value.GetType() == typeof(VT))
{
rval.Add(new KeyValuePair<KT, VT>((KT)kvp.Key, (VT)kvp.Value.value));
}
}
}
this.VsyncCallDone();
return rval;
}
/// <summary>
/// Returns a clone containing of the Vsync DHT
/// </summary>
public IEnumerable<KeyValuePair<object, object>> DHT()
{
List<KeyValuePair<object, object>> rval = new List<KeyValuePair<object, object>>();
if (!this.VsyncCallStart())
{
return rval;
}
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
foreach (KeyValuePair<object, DHTItem> kvp in this.DHTContents)
{
rval.Add(new KeyValuePair<object, object>(kvp.Key, kvp.Value.value));
}
}
this.VsyncCallDone();
return rval;
}
/// <summary>
/// Returns an integer (small non-negative numbers 0... nmembers/replication factor) indicating which DHT partition this member belongs to
/// </summary>
/// <returns></returns>
public int DHTGetPartition()
{
return this.GetAffinityGroup(this.GetMyRank());
}
/// <summary>
/// Returns an integer (small non-negative numbers 0... nmembers/replication factor) indicating my rank within my partition
/// </summary>
/// <returns></returns>
public int DHTGetPartitionRank()
{
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
int rank = 0;
int myAg = this.GetAffinityGroup(this.GetMyRank());
if (myAg >= 0)
{
for (int i = 0; i < theView.members.Length; i++)
{
if (theView.members[i].isMyAddress())
{
return rank;
}
if (this.GetAffinityGroup(i) == myAg)
{
++rank;
}
}
}
return -1;
}
internal LockObject DHTDictLock = new LockObject("DHTLock");
internal int myDHTBinSize;
internal int myDHTMinSize;
internal int myDHTnShards;
internal Func<object, object, object, object> myResolver;
internal View curDHTView;
internal LockObject DHTViewLock = new LockObject("DHTViewLock");
internal bool myDHTInDebugMode = false;
internal bool[,] DHTHMaps;
internal bool[] DHTAgNonEmpty;
internal int myTargetGroupSize;
internal const int DHTRedundancyFactor = 2;
// How many existing members push DHT content to you when membership shuffles due to churn
internal long myDHTItemTimeout;
// Caller holds the lock, so no need to acquire it here
internal void DHTWriter(object key, DHTItem item)
{
DHTItem dhi = null;
bool collision = this.DHTContents.ContainsKey(key);
if (collision)
{
dhi = this.DHTContents[key];
}
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("DHT_PUT: kvp=<" + key + "::" + item.value + "(AG=" + this.GetAffinityGroup(DHTKeyHash(key)) + ", myAG=" + this.GetAffinityGroup(this.GetMyRank()) + ")>" + (collision ? " ** COLLISION **" : string.Empty));
}
if (collision)
{
bool resolved = false;
object newValue = null;
// If the resolver was specified, let it decide what to do
if (this.theResolvers.hList.Count > 0)
{
object[] obs = { key, dhi.value, item.value };
foreach (CallBack del in this.theResolvers.hList)
{
if (DHTTypeMatch(obs, del) && del.cbProc.nParams == 3)
{
#if !(__MonoCS__)
if (del.cbProc.cb != null)
{
newValue = ((dynamic)del.cbProc.cb).Invoke((dynamic)obs[0], (dynamic)obs[1], (dynamic)obs[2]);
}
else
{
MethodInfo mi = del.cbProc.hisCb.GetType().GetMethod("Invoke");
newValue = mi.Invoke(del.cbProc.hisCb, obs);
}
#else
newValue = del.cbProc.hisCb.DynamicInvoke(obs);
#endif
resolved = true;
break;
}
}
}
if (!resolved && this.myResolver != null)
{
newValue = this.myResolver(key, dhi.value, item.value);
resolved = true;
}
if (resolved)
{
this.DHTContents.Remove(key);
if (!this.usesDHTDefaults)
{
this.DHTExtWriter(key, null);
}
if (newValue == null)
{
return;
}
DHTItem dhtItem = this.usesDHTDefaults ? new DHTItem(item, newValue) : new DHTItem(item, this.DHTExtWriter(key, newValue));
this.DHTContents.Add(key, dhtItem);
return;
}
// If two puts are done on the same key, keep the Put with the larger create time. This is a wall-clock time of day
// In the whole of Vsync, this is the only line of code sensitive to the quality of clock synchronization....
if (dhi.createTimestamp >= item.createTimestamp)
{
return;
}
if (!this.DHTContents.Remove(key))
{
Vsync.WriteLine("WARNING: unexpected failure to remove " + key);
}
if (!this.usesDHTDefaults)
{
this.DHTExtWriter(key, null);
}
}
DHTItem dhtItem2 = this.usesDHTDefaults ? item : new DHTItem(item, this.DHTExtWriter(key, item.value));
this.DHTContents.Add(key, dhtItem2);
}
internal object DHTReader(object key)
{
DHTItem dhi;
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
this.DHTContents.TryGetValue(key, out dhi);
}
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("DHT_GET: key=<" + key + ">" + (dhi != null ? " found (value=" + dhi.value + ")" : " not found"));
}
if (dhi != null)
{
if (this.usesDHTDefaults)
{
return dhi.value;
}
return this.DHTExtReader(key, (string)dhi.value);
}
return null;
}
internal bool usesDHTDefaults = true;
/// <summary>
/// By specifying a DHTPutMethod, the user can provide a method that stores the actual value objects
/// For example, they could be stored in external files. The PutMethod returns a locator string
/// </summary>
/// <param name="Key">The key of the object to be stored. If reused, the object is being overwritten</param>
/// <param name="Value">The value of the object to be stored. Null if the object is being deleted</param>
/// <returns></returns>
public delegate string DHTPutMethod(object Key, object Value);
/// <summary>
/// Called when fetching a value for an externally stored object
/// </summary>
/// <param name="Key">Key of the externally stored object</param>
/// <param name="locator">string that was provided by the DHTPutMethod when it was last saved</param>
/// <returns></returns>
public delegate object DHTGetMethod(object Key, string locator);
internal DHTPutMethod DHTExtWriter;
internal DHTGetMethod DHTExtReader;
/// <summary>
/// Allows a developer to create a DHT that stores its data in a non-standard way. The default methods keep the DHT data in memory
/// This API is mostly intended to support DHTs in which the actual values are stored in the local file system by the group members
/// </summary>
/// <param name="writerMethod"></param>
/// <param name="readerMethod"></param>
public void SetDHTPersistenceMethods(DHTPutMethod writerMethod, DHTGetMethod readerMethod)
{
this.usesDHTDefaults = false;
this.DHTExtWriter = writerMethod;
this.DHTExtReader = readerMethod;
}
/// <summary>
/// Puts the current group into DHT mode, but without specifying a target size and without setting aside extra members.
/// </summary>
/// <param name="ReplicationFactor">Requested data replication factor</param>
/// <param name="ExpectedGroupSize">Your estimate of the typical size of this group (N); a multiple of the replication factor</param>
/// <param name="MinimumGroupSize">The smallest group size at which the DHT will accept DHTPut/DHTGet commands</param>
/// <remarks>If the replication factor is too small, you run the risk that our random hashing scheme could leave some affinity
/// group with too few, or too many members. Too many is not a problem, but if an affinity group has too few members you can see
/// failures that cause data to be completely lost (e.g. if the only replica fails). Thus when you ask for a replication factor of,
/// say, 3 this is a <it>target</it>, not the minimum that might be used. We haven't had problems with factors of 5 or more in groups of 25 or more members.
///
/// In the event of membership changes, the group will automatically recompute the DHT hash partitioning function and ship (key,value) pairs around
/// to ensure that each member has the correct items. This can be a substantial overhead in heavily populated DHTs with large numbers of members.
/// In such cases, use the full DHTEnable API and indicate to the system a size you are targetting for the active DHT, and a number of "spare"
/// members that the system can leave out of the DHT (for example, you might target 1000 members, and allow 5 spares; if the group were to have
/// 1005 members (or more), N-5 will run the DHT, while the remaining 5 play a backup role.
///
/// With spares, if a member fails, one of the spares will be dropped in to replace it. This limits the amount of reinitialization and
/// "churning" of (key,value) tuples: that spare needs to be initialized, but other members retain their roles and no large-scale churn occurs.
/// Then you can launch extra spares if the number starts to drop towards your specified target.
///
/// When the group has fewer than the ExpectedGroupSize members, the spare mechanism is disabled. So, continuing our example above, with 950 members, no spares
/// would be available and the DHT would indeed churn if a failure or join changed the DHT hash-based partitioning of the members.
///
/// To put the DHT into debug mode for developing your application, you can use ReplicationFactor, ExpectedGroupSize and MinimumGroupSize all set to 1,
/// and then test with just 1 or 2 members in the group,
/// but remember to use reasonable values later when your group will have more than 1 member.</remarks>
public void DHTEnable(int ReplicationFactor, int ExpectedGroupSize, int MinimumGroupSize)
{
this.DHTEnable(ReplicationFactor, ExpectedGroupSize, MinimumGroupSize, int.MaxValue);
}
/// <summary>
/// Puts the current group into DHT mode, but without specifying a target size and without setting aside extra members.
/// </summary>
/// <param name="ReplicationFactor">Requested data replication factor</param>
/// <param name="ExpectedGroupSize">Your estimate of the typical size of this group (N); a multiple of the replication factor</param>
/// <param name="MinimumGroupSize">The smallest group size at which the DHT will accept DHTPut/DHTGet commands</param>
/// <param name="timeout">Timeout, after which the (Key,Value) pair is automatically discarded.</param>
/// <remarks>If the replication factor is too small, you run the risk that our random hashing scheme could leave some affinity
/// group with too few, or too many members. Too many is not a problem, but if an affinity group has too few members you can see
/// failures that cause data to be completely lost (e.g. if the only replica fails). Thus when you ask for a replication factor of,
/// say, 3 this is a <it>target</it>, not the minimum that might be used. We haven't had problems with factors of 5 or more in groups of 25 or more members.
///
/// In the event of membership changes, the group will automatically recompute the DHT hash partitioning function and ship (key,value) pairs around
/// to ensure that each member has the correct items. This can be a substantial overhead in heavily populated DHTs with large numbers of members.
/// In such cases, use the full DHTEnable API and indicate to the system a size you are targetting for the active DHT, and a number of "spare"
/// members that the system can leave out of the DHT (for example, you might target 1000 members, and allow 5 spares; if the group were to have
/// 1005 members (or more), N-5 will run the DHT, while the remaining 5 play a backup role.
///
/// With spares, if a member fails, one of the spares will be dropped in to replace it. This limits the amount of reinitialization and
/// "churning" of (key,value) tuples: that spare needs to be initialized, but other members retain their roles and no large-scale churn occurs.
/// Then you can launch extra spares if the number starts to drop towards your specified target.
///
/// When the group has fewer than the ExpectedGroupSize members, the spare mechanism is disabled. So, continuing our example above, with 950 members, no spares
/// would be available and the DHT would indeed churn if a failure or join changed the DHT hash-based partitioning of the members.
///
/// To put the DHT into debug mode for developing your application, you can use ReplicationFactor, ExpectedGroupSize and MinimumGroupSize all set to 1,
/// and then test with just 1 or 2 members in the group,
/// but remember to use reasonable values later when your group will have more than 1 member.</remarks>
public void DHTEnable(int ReplicationFactor, int ExpectedGroupSize, int MinimumGroupSize, int timeout)
{
if (!this.VsyncCallStart())
{
return;
}
if (timeout > 0 && timeout < int.MaxValue)
{
this.myDHTItemTimeout = Math.Max(timeout, 2500);
Vsync.OnTimer(this.myDHTItemTimeout, this.DHTCleanup);
}
if (this.myDHTBinSize != 0)
{
throw new VsyncDHTException("Can't call DHTEnable more than once for the same group");
}
if (ExpectedGroupSize % ReplicationFactor != 0)
{
throw new VsyncDHTException("Target Vsync DHT group size isn't a multiple of the replication factor");
}
this.myDHTMinSize = MinimumGroupSize;
this.myDHTBinSize = ReplicationFactor;
this.myDHTnShards = ExpectedGroupSize / ReplicationFactor;
this.lastVersionId = new int[this.myDHTnShards];
this.retainedKVPS = new List<object[]>[this.myDHTnShards];
for (int n = 0; n < this.myDHTnShards; n++)
{
this.retainedKVPS[n] = new List<object[]>();
}
if (MinimumGroupSize < this.myDHTnShards)
{
throw new VsyncDHTException("DHTEnable: the minimum size must be at least ExpectedGroupSize/ReplicationFactor");
}
this.myTargetGroupSize = ExpectedGroupSize;
if (ReplicationFactor == 1 && ExpectedGroupSize == 1 && MinimumGroupSize == 1)
{
this.myDHTInDebugMode = true;
}
else if (ReplicationFactor < 2 || this.myDHTBinSize < 2)
{
throw new VsyncDHTException("DHTEnable: Replication factor must be >= 2 and (ExpectedGroupSize / ReplicationFactor) must be > 1");
}
this.doRegister(Vsync.IM_DHT_PUT, new Action<object>(kvps => this.doDHTPut(kvps, 0, -1, -1)));
this.doRegister(Vsync.IM_DHT_PUT, new Action<object, int, int, int>(this.doDHTPut));
this.doRegister(Vsync.IM_DHT_PUT, new Action<int, int, int>((sendingShard, versionId, viewId) => this.doDHTPut(null, sendingShard, versionId, viewId)));
this.doRegister(Vsync.IM_DHT_GET, new Action<object>(keys =>
{
int myAg = this.GetAffinityGroup(Vsync.my_address);
List<KeyValuePair<object, object>> result = new List<KeyValuePair<object, object>>();
foreach (object key in (IEnumerable)keys)
{
int khash = this.DHTKeyHash(key);
// Filter if not for my affinity group; shouldn't happen at all
if (this.GetAffinityGroup(khash) == myAg && this.theView.GetMyRank() < this.theView.members.Length)
{
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
object value;
if ((value = this.DHTReader(key)) != null)
{
result.Add(new KeyValuePair<object, object>(key, value));
}
}
}
else if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("DHT_GET hander: filtering and ignoring a get to a different affinity group");
}
}
// This is necessary because of a bug in the Message library: it seems to have a problem with zero-length
// lists of KeyValuePair<kt,vt> objects and encodes them as zero-length lists of type object..
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
string ks = " ";
foreach (object key in (IEnumerable)keys)
{
ks += key + "[Ag=" + this.GetAffinityGroup(key.GetHashCode()) + "] ";
}
Vsync.WriteLine("DHT_GET hander[Ag=" + myAg + "]: I was asked about keys " + ks + " and found " + result.Count + " matches");
}
if (result.Count > 0)
{
this.doReply(result);
}
else
{
this.doNullReply();
}
}));
// Uses a special hook: *= will put the handler at the front of the queue, ensuring that the DHT mapping is
// recomputed before anyone might try and use it.
this.ViewHandlers *= v =>
{
if (v.members.Length == 0)
{
return;
}
using (var tmpLockObj = new LockAndElevate(this.DHTViewLock))
{
this.curDHTView = v;
}
int nActive = v.members.Length;
this.DHTHMaps = new bool[this.myDHTnShards, log2(Math.Max(nActive, this.myDHTnShards * this.myDHTBinSize)) + 1];
this.DHTAgNonEmpty = new bool[this.myDHTnShards];
for (int n = 0; n < nActive; n++)
{
int pseudoGrp = this.GetAffinityGroup(v.members[n]);
if (pseudoGrp >= 0)
{
this.DHTHMaps[pseudoGrp, log2(n + 1)] = true;
this.DHTAgNonEmpty[pseudoGrp] = true;
}
}
this.replaylastSTKVPS(v.viewid);
};
// This is special too: we get a chance to shift DHT key-value items around during the PROPOSED step as new views are computed
this.FlushHandlers += this.DHTShuffle;
// Goals here are to (1) Spread the work around, and (2) Make sure each joining member is initialized by a sender
// belonging to the appropriate affinity group. This code is a bit awkward because it only can be used when the
// DHT is sufficiently large, and because a DHT with a number of members that isn't an exact multiple of
// myDHTnBins might have a few "extra" members at the end. This is why when rankOfCheckptSender wraps we
// always want to set it back to hisAg, not to rankOfCheckptSender%v.members.Length!
this.RegisterChkptChoser((v, who) =>
{
View prevView;
using (var tmpLockObj = new LockAndElevate(this.DHTViewLock))
{
prevView = this.curDHTView;
}
if (v.joiners.Contains(Vsync.my_address))
{
return false;
}
int hisAg = this.GetAffinityGroup(v, who);
int rankOfCheckptSender = ((who.GetHashCode() % this.myDHTnShards) * this.myDHTnShards) + hisAg;
List<Address> candidates = new List<Address>();
for (int off = 0; off < prevView.members.Length / this.myDHTnShards; off++)
{
if (rankOfCheckptSender >= prevView.members.Length)
{
rankOfCheckptSender = hisAg;
}
if (!v.leavers.Contains(prevView.members[rankOfCheckptSender]))
{
if (this.GetAffinityGroup(rankOfCheckptSender) != hisAg)
{
throw new VsyncException("Bug in new DHT checkpoint choser (1)");
}
if(!v.joiners.Contains(prevView.members[rankOfCheckptSender]))
candidates.Add(prevView.members[rankOfCheckptSender]);
}
rankOfCheckptSender += this.myDHTnShards;
}
if (candidates.Count() == 0)
{
return true;
}
return candidates[(who.GetHashCode() % candidates.Count())].isMyAddress();
});
this.RegisterMakeChkpt(view =>
{
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
if (this.DHTContents.Count > 0)
{
if (this.usesDHTDefaults)
{
// In-memory case; just send the whole thing in one big message
this.SendChkpt(Msg.toBArray(this.DHTContents.Keys.ToArray()), Msg.toBArray(this.DHTContents.Values.ToArray()));
}
else
{
// External user-defined storage. Here we do the transfer one object at a time because the files
// in external storage could be quite large (a common reason for storing them externally)
foreach (KeyValuePair<object, DHTItem> kvp in this.DHTContents)
{
DHTItem chkptDhtItem = new DHTItem(kvp.Value, this.DHTExtReader(kvp.Key, (string)kvp.Value.value));
this.SendChkpt(Msg.toBArray(new[] { kvp.Key }), Msg.toBArray(new[] { chkptDhtItem }));
}
}
}
}
this.EndOfChkpt();
});
this.RegisterLoadChkpt(new Action<byte[], byte[]>((kba, vba) =>
{
object[] keys = Msg.BArrayToObjects(kba);
object[] values = Msg.BArrayToObjects(vba);
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
for (int i = 0; i < keys.Length; i++)
{
if (this.usesDHTDefaults)
{
if (!this.DHTContents.ContainsKey(keys[i]))
{
this.DHTContents.Add(keys[i], (DHTItem)values[i]);
}
}
else
{
this.DHTWriter(keys[i], (DHTItem)values[i]);
}
}
}
}));
this.VsyncCallDone();
}
private int[] lastVersionId;
private int myDHTLastViewId = -1;
private int myDHTGetHashCodeVID;
private readonly LockObject myDHTGetHashCodeVIDLock = new LockObject("myDHTGetHashCodeVIDLock");
private List<object[]>[] retainedKVPS;
private readonly LockObject lastSTKVPSLock = new LockObject("lastSTKVPSLock");
private List<object[]> lastSTKVPS = new List<object[]>();
private readonly Semaphore waitForPutSema = new Semaphore(0, int.MaxValue);
private void doDHTPut(object kvps, int sendingShard, int versionId, int viewId)
{
int myAg = this.GetAffinityGroup(Vsync.my_address);
using (var tmpLockObj = new LockAndElevate(this.myDHTGetHashCodeVIDLock))
{
if (viewId > this.myDHTLastViewId)
{
using (var tmpLockObj1 = new LockAndElevate(this.lastSTKVPSLock))
{
this.lastSTKVPS.Add(new[] { kvps, sendingShard, versionId, viewId });
}
return;
}
if (versionId > 0)
{
// This branch detects and ignores DHTGetHashCodeChanged-related duplicates. Those never
// come from sendingShard == my own shard Id, so that one entry in lastVersionId is untouched
if (sendingShard == myAg)
{
throw new VsyncException("doDHTPut: sendingShard came through as " + sendingShard + ", which is my own shard!");
}
if (versionId <= this.lastVersionId[sendingShard])
{
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("DHT_PUT hander: filtering and ignoring a duplicated shuffle message");
}
return;
}
if (versionId > this.myDHTGetHashCodeVID)
{
// Arrived early...
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("DHT_PUT hander: retaining an early shuffle message");
}
this.retainedKVPS[sendingShard].Add(new[] { kvps, sendingShard, versionId, viewId });
return;
}
this.lastVersionId[sendingShard] = versionId;
}
else if (versionId == 0)
{
// This branch uses the lastVersionId[myAg] to hold the viewid, and in that way senses dups
if (viewId <= this.lastVersionId[myAg])
{
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("DHT_PUT hander: filtering and ignoring a duplicated state transfer message");
}
return;
}
if (viewId != this.lastVersionId[myAg] + 1 && this.lastVersionId[myAg] != -1)
{
throw new VsyncException("in doDHTPut: viewId of " + viewId + " was unexpected: lastVersionId[myAg=" + myAg + "] was " + this.lastVersionId[myAg]);
}
this.lastVersionId[myAg] = viewId;
}
}
if (kvps != null)
{
foreach (object okvp in (IEnumerable)kvps)
{
object key = okvp.GetType().GetMethod("get_Key").Invoke(okvp, new object[0]);
DHTItem value = (DHTItem)okvp.GetType().GetMethod("get_Value").Invoke(okvp, new DHTItem[0]);
int khash = this.DHTKeyHash(key);
// Filter if not for my affinity group
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (this.GetAffinityGroup(khash) == this.GetAffinityGroup(Vsync.my_address) && theView.GetMyRank() < theView.members.Length)
{
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
this.DHTWriter(key, value);
}
}
else if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
Vsync.WriteLine("DHT_PUT hander: filtering and ignoring a put to a different affinity group");
}
}
}
if (versionId > 0)
{
this.waitForPutSema.Release();
}
}
private void replaylastSTKVPS(int vid)
{
if (vid <= this.myDHTLastViewId)
{
return;
}
this.myDHTLastViewId = vid;
List<object[]> old;
using (var tmpLockObj = new LockAndElevate(this.lastSTKVPSLock))
{
old = this.lastSTKVPS;
this.lastSTKVPS = new List<object[]>();
}
foreach (object[] oo in old)
{
int idx = 0;
object kvps = oo[idx++];
int sendingshard = (int)oo[idx++];
int versionid = (int)oo[idx++];
int viewid = (int)oo[idx];
if (viewid > vid)
{
using (var tmpLockObj = new LockAndElevate(this.lastSTKVPSLock))
{
this.lastSTKVPS.Add(oo);
}
}
else if (viewid == vid)
{
this.doDHTPut(kvps, sendingshard, versionid, viewid);
}
}
}
/// <summary>
/// Called when the application has modified the GetHashCode method used to map keys to shards
/// </summary>
/// <param name="versionId">A positive GetHashCode() "version number" that needs to be identical in all the callers.</param>
/// <remarks>The GetHashCode method needs to change through a series of consistent totally ordered actions, and
/// for each change, the same version number should be used by all members. The number must also increase each time
/// DHTGetHashCodeChanged is called in any given view (for example, you can just use a counter initialized to 1).
/// </remarks>
public void DHTGetHashCodeChanged(int versionId)
{
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (versionId <= 0 || versionId <= this.lastVersionId[this.GetAffinityGroup(theView.GetMyRank())])
{
throw new VsyncDHTException("DHTGetHashCodeChanged: version number must be > 0, and must increase each time this method is called in a given view");
}
using (var tmpLockObj = new LockAndElevate(this.myDHTGetHashCodeVIDLock))
{
this.myDHTGetHashCodeVID = versionId;
}
this.DHTShuffle(theView, versionId, true);
for (int i = 0; i < this.myDHTnShards; i++)
{
List<object[]> kvpsl;
using (var tmpLockObj = new LockAndElevate(this.myDHTGetHashCodeVIDLock))
{
kvpsl = this.retainedKVPS[i];
this.retainedKVPS[i] = new List<object[]>();
}
foreach (object[] oo in kvpsl)
{
int idx = 0;
object kvps = oo[idx++];
int sendingshard = (int)oo[idx++];
int versionid = (int)oo[idx++];
int viewid = (int)oo[idx];
this.doDHTPut(kvps, sendingshard, versionid, viewid);
}
}
for (int cnt = 1; cnt < this.myDHTnShards; cnt++)
{
ILock.NoteThreadState("waitForPutSema.WaitOne()");
this.waitForPutSema.WaitOne();
ILock.NoteThreadState(null);
}
}
private void DHTShuffle(View nextView)
{
this.DHTShuffle(nextView, 0, false);
}
/*
* Shuffles DHT items that are on the wrong node after a view change or a user change to the GetHashCode() method for keys
*/
private void DHTShuffle(View nextView, int versionId, bool shuffleMode)
{
if (nextView.members.Length == 0)
{
return;
}
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (theView == null)
{
return;
}
int myOldRank = theView.GetMyRank();
int myNewRank = nextView.GetMyRank();
int myOldAg = this.GetAffinityGroup(myOldRank), myNewAg = this.GetAffinityGroup(myNewRank), myRankInMyAg = myNewRank / this.myDHTnShards;
if (myOldRank == -1)
{
return;
}
if (!shuffleMode && nextView.leavers.Length > 0)
{
// DHT state reconciliation for existing members that got shifted around by the new view and are no longer in the same affinity group
// Each member looks for the next DHTRedundancyFactor guys who are now new members of his affinity group, but previously weren't, and sends state
int rankOfMember = myOldRank;
int fnd = 0;
int coverFor = 0;
for (int cnt = 1; cnt < this.myDHTnShards && (cnt - coverFor - 1) < DHTRedundancyFactor; cnt++)
{
if (nextView.leavers.Contains(theView.members[(myOldRank + (cnt * this.myDHTnShards)) % theView.members.Length]))
{
++coverFor;
}
}
for (int cnt = 0; cnt < theView.members.Length - 1 && fnd < DHTRedundancyFactor + coverFor; cnt++)
{
rankOfMember = (rankOfMember + 1) % theView.members.Length;
Address who = theView.members[rankOfMember];
int rankInNextView = nextView.GetRankOf(who);
if (rankInNextView != -1 && this.GetAffinityGroup(rankInNextView) == myOldAg && this.GetAffinityGroup(rankInNextView) != this.GetAffinityGroup(rankOfMember))
{
// We found someone who will be in "our" affinity group but isn't currently there
++fnd;
List<KeyValuePair<object, DHTItem>> kvps = new List<KeyValuePair<object, DHTItem>>();
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
foreach (KeyValuePair<object, DHTItem> item in this.DHTContents)
{
kvps.Add(item);
}
}
if (kvps.Count > 0)
{
this.P2PSend(who, Vsync.IM_DHT_PUT, kvps, myNewAg, 0, nextView.viewid);
}
}
}
this.Flush();
}
else if (shuffleMode && myRankInMyAg < DHTRedundancyFactor)
{
List<KeyValuePair<object, DHTItem>>[] kvpsl = new List<KeyValuePair<object, DHTItem>>[this.myDHTnShards];
for (int n = 0; n < this.myDHTnShards; n++)
{
kvpsl[n] = new List<KeyValuePair<object, DHTItem>>();
}
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
foreach (KeyValuePair<object, DHTItem> item in this.DHTContents)
{
kvpsl[this.GetAffinityGroup(this.DHTKeyHash(item.Key))].Add(item);
}
for (int n = 0; n < nextView.members.Length; n++)
{
if (this.GetAffinityGroup(n) != myNewAg)
{
List<KeyValuePair<object, DHTItem>> kvps = kvpsl[this.GetAffinityGroup(n)];
if (kvps.Count > 0)
{
this.P2PSend(nextView.members[n], Vsync.IM_DHT_PUT, kvps, myNewAg, versionId, nextView.viewid);
}
else
{
this.P2PSend(nextView.members[n], Vsync.IM_DHT_PUT, myNewAg, versionId, nextView.viewid);
}
}
}
}
}
if (this.GetAffinityGroup(myOldRank) != this.GetAffinityGroup(myNewRank) || shuffleMode)
{
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
Dictionary<object, DHTItem> old = this.DHTContents;
this.DHTContents = new Dictionary<object, DHTItem>(1000);
foreach (KeyValuePair<object, DHTItem> item in old)
{
if (this.GetAffinityGroup(this.DHTKeyHash(item.Key)) == this.GetAffinityGroup(myNewRank))
{
this.DHTContents.Add(item.Key, item.Value);
}
}
}
}
}
private void DHTCleanup()
{
List<object> toDelete = new List<object>();
using (var tmpLockObj = new LockAndElevate(this.DHTDictLock))
{
long now = DateTime.UtcNow.Ticks;
foreach (KeyValuePair<object, DHTItem> kvp in this.DHTContents)
{
if ((now - kvp.Value.createTimestamp) / 10000L > this.myDHTItemTimeout)
{
toDelete.Add(kvp.Key);
}
}
foreach (object key in toDelete)
{
this.DHTContents.Remove(key);
}
}
if (!this.usesDHTDefaults)
{
foreach (object key in toDelete)
{
this.DHTExtWriter(key, null);
}
}
Vsync.OnTimer(this.myDHTItemTimeout, this.DHTCleanup);
}
private int GetAffinityGroup(Address a)
{
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
return (this.theView == null) ? -1 : this.GetAffinityGroup(this.theView, a);
}
}
internal int GetAffinityGroup(View v, Address a)
{
return v.GetRankOf(a) % this.myDHTnShards;
}
internal int GetAffinityGroup(int hash)
{
return Math.Abs(hash) % this.myDHTnShards;
}
internal int DHTKeyHash(object key)
{
return ((key is int) ? (int)key : key.GetHashCode()) % this.myDHTnShards;
}
internal List<Address> GetAgMembers(List<object> keys)
{
List<Address> mlist = new List<Address>();
bool[] IncludeAg = new bool[this.myDHTnShards];
foreach (object key in keys)
{
IncludeAg[this.GetAffinityGroup(this.DHTKeyHash(key))] = true;
}
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
for (int n = 0; n < theView.members.Length; n++)
{
if (IncludeAg[this.GetAffinityGroup(n)])
{
mlist.Add(theView.members[n]);
}
}
return mlist;
}
// Create something that looks like the group address but indicates the AffinityGroup via the port numbers
internal Address PseudoAddress(int AffinityGroup)
{
Address pa = this.gaddr;
pa.p2pPort = pa.ackPort = AffinityGroup;
pa.cachedHashCode = 0;
return pa;
}
internal int GetAffinityGrpIdx(Address addr)
{
if (addr == Vsync.VSYNCMEMBERS.gaddr)
{
return -1;
}
Address a = new Address(addr);
Address mg = new Address(this.gaddr);
a.p2pPort = a.ackPort = a.cachedHashCode = 0;
mg.p2pPort = mg.ackPort = mg.cachedHashCode = 0;
if (mg == a)
{
return addr.p2pPort;
}
return -1;
}
/// <summary>
/// Uses the current group as a DHT and stores a new (key,value) pair, which overwrites any previous one.
/// </summary>
/// <param name="key">key for the object being stored</param>
/// <param name="value">value of that object</param>
/// <remarks>DHT put operations aren't totally ordered, hence concurrent Add requests for the same key leave the DHT in an inconsistent state.</remarks>
public void DHTPut(object key, object value)
{
this.DHTPut(new List<KeyValuePair<object, object>> { new KeyValuePair<object, object>(key, value) });
}
internal myHandlers theResolvers = new myHandlers();
/// <summary>
/// Polymorphic variant of SetDHTPutCollisionResolver.
/// </summary>
/// <typeparam name="KT">Key type</typeparam>
/// <typeparam name="VT">Value type</typeparam>
/// <typeparam name="VTIncoming">Value type for the incoming value that triggered the Put collision</typeparam>
/// <param name="theResolver">Method with signature VT theResolver(KT,VT,VTIncoming)</param>
/// <remarks>
/// This variant of SetDHTPutCollisionResolver allows you to specify multiple resolvers, each with
/// its own type signature. If you use this option, Vsync will call the first resolver that matches.
/// If none matches, Vsync will check to see if there is a universal resolver for the group.
/// </remarks>
public void SetDHTPutCollisionResolver<KT, VT, VTIncoming>(Func<KT, VT, VTIncoming, VT> theResolver)
{
if (typeof(KT) == typeof(object) && typeof(VT) == typeof(object) && typeof(VTIncoming) == typeof(object))
{
this.myResolver = (dynamic)theResolver;
return;
}
this.theResolvers += theResolver;
}
/// <summary>
/// DHTPut for a single key,value pair
/// </summary>
/// <typeparam name="KT">Key type</typeparam>
/// <typeparam name="VT">Value type</typeparam>
/// <param name="key">key</param>
/// <param name="value">value</param>
public void DHTPut<KT, VT>(KT key, VT value)
{
this.DHTPut(new List<KeyValuePair<KT, VT>> { new KeyValuePair<KT, VT>(key, value) });
}
/// <summary>
/// Uses the current group as a DHT and stores a set of new (key,value) pairs, which overwrite any previous ones. Atomicity is not guaranteed
/// </summary>
/// <param name="kvps">A list of key-value pairs to put</param>
/// <typeparam name="KT"></typeparam>
/// <typeparam name="VT"></typeparam>
/// <remarks>DHT put operations aren't totally ordered, hence concurrent Add requests for the same key leave the DHT in an inconsistent state.</remarks>
public void DHTPut<KT, VT>(List<KeyValuePair<KT, VT>> kvps)
{
View theView;
using(var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (!this.VsyncCallStart())
{
return;
}
if (this.myDHTBinSize == 0)
{
throw new VsyncException("DHTPut: must first call DHTEnable");
}
if (this.DHTAgNonEmpty == null)
{
throw new VsyncException("DHTPut: group must be created and initial members should join before first call to DHTPut");
}
if (!this.HasFirstView || theView.members.Length < this.myDHTMinSize)
{
throw new VsyncDHTException("DHTPut: group is currently smaller (" + this.theView.members.Length + ") than the minimum required size (" + this.myDHTMinSize + ")");
}
foreach (KeyValuePair<KT, VT> kvp in kvps)
{
object key = kvp.Key;
int khash = this.DHTKeyHash(key);
FlowControl.FCBarrierCheck();
int targetAg = this.GetAffinityGroup(khash);
if (targetAg < 0 || targetAg > this.DHTAgNonEmpty.Length)
{
throw new VsyncException("DHTPut: key " + key + " mapped to affinity group " + targetAg + " which was outside range of DHTAgNonEmpty[0.." + this.DHTAgNonEmpty.Length + "]");
}
if (!this.DHTAgNonEmpty[targetAg])
{
Vsync.WriteLine("WARNING: DHTPut called on a <key,value> pair that maps to a depopulated affinity group (hint: maybe the underlying group or the replication factor is too small!)");
continue;
}
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
object value = kvp.Value;
Vsync.WriteLine("Application called DHTPut with key=" + key + ", value=" + value + ", HashCode = " + this.DHTKeyHash(key) + ", targetAg=" + targetAg);
}
try
{
ILock.NoteThreadState("Wedged.WaitOne()");
this.Wedged.WaitOne();
ILock.NoteThreadState(null);
if (this.myDHTBinSize < 20 || this.myDHTBinSize <= 2 * log2(theView.members.Length) || this.myDHTInDebugMode)
{
// Until the actual size of an affinity group gets fairly large, send p2p. But as the number of p2p sends pushes beyond log2, tunnelling will eventually
// be preferable despite its higher latency
for (int off = targetAg; off < theView.members.Length; off += this.myDHTnShards)
{
this.P2PSend(theView.members[off], Vsync.IM_DHT_PUT, new List<KeyValuePair<object, DHTItem>> { new KeyValuePair<object, DHTItem>(kvp.Key, new DHTItem(kvp.Value)) });
}
}
else
{
// We're use the PseudoAddress here as a kind of trick to avoid actually creating N/ReplicationFactor real groups
List<KeyValuePair<object, DHTItem>> list = new List<KeyValuePair<object, DHTItem>> { new KeyValuePair<object, DHTItem>(kvp.Key, new DHTItem(kvp.Value)) };
this.IPMCTunnel(this.PseudoAddress(this.GetAffinityGroup(khash)), Msg.toBArray(new object[] { Vsync.IM_DHT_PUT, list }));
}
}
finally
{
this.Wedged.Release();
}
}
this.VsyncCallDone();
}
/// <summary>
/// Treats the the current group as a DHT and retrieves an object by key
/// </summary>
/// <param name="key"></param>
/// <returns>The value from the (key,value) pair</returns>
public VT DHTGet<KT, VT>(KT key)
{
List<KeyValuePair<KT, VT>> res = this.DHTGet<KT, VT>(new List<KT> { key });
if (res.Count > 0)
{
return res[0].Value;
}
return default(VT);
}
internal static Timeout DHTTimeout = new Timeout(15000, Timeout.TO_NULLREPLY);
/// <summary>
/// Treats the the current group as a DHT and retrieves a list of objects by key. No promise of atomicity
/// with respect to DHTPut (if that's what you need, use DHTOrderedGet)
/// </summary>
/// <param name="ikeys"></param>
/// <typeparam name="KT"></typeparam>
/// <typeparam name="VT"></typeparam>
/// <returns>A list of (key,value) pairs for items that were found</returns>
public List<KeyValuePair<KT, VT>> DHTGet<KT, VT>(IEnumerable<KT> ikeys)
{
if (!this.VsyncCallStart())
{
return new List<KeyValuePair<KT, VT>>();
}
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
List<KT> keys = new List<KT>();
foreach (KT ik in ikeys)
{
keys.Add(ik);
}
if ((VsyncSystem.Debug & VsyncSystem.DHTS) != 0)
{
string ks = " ";
foreach (KT key in keys)
{
ks += key + " ";
}
Vsync.WriteLine("Application called DHTGet with keys { " + ks + "}");
}
if (this.myDHTBinSize == 0)
{
throw new VsyncException("DHTGet: must first call DHTEnable");
}
if (this.DHTAgNonEmpty == null)
{
throw new VsyncException("DHTGet: group must be created and initial members should join before first call to DHTPut");
}
if (!this.HasFirstView || this.GetSize() < this.myDHTMinSize)
{
throw new VsyncDHTException("DHTGet: group is currently smaller than the minimum required size");
}
List<List<KeyValuePair<KT, VT>>> partialResults = new List<List<KeyValuePair<KT, VT>>>();
IEnumerable<IGrouping<int, KT>> keysGroupedByAg = keys.GroupBy(key => this.GetAffinityGroup(this.DHTKeyHash(key)));
foreach (IGrouping<int, KT> GroupedKey in keysGroupedByAg)
{
if (this.myDHTBinSize == 0)
{
throw new VsyncException("DHTGet: must first call DHTEnable");
}
int N = theView.members.Length;
int myBase = theView.GetMyRank();
int startAt = rand.Next(N);
int log2N = log2(N);
int kgrp = GroupedKey.Key;
Address whoToAsk = null;
Address dontAsk = null;
// Note: this logic only makes sense for at most 2 tries!
for (int retry = 0; retry < 2 && dontAsk != Vsync.my_address; retry++)
{
try
{
if (this.GetAffinityGroup(myBase) == kgrp)
{
whoToAsk = Vsync.my_address;
}
else
{
// First check and see if there happens to be a 1-hop neighbor we could ask
for (int n = 0; whoToAsk == null && n < log2N; n++)
{
if (this.GetAffinityGroup((myBase + (1 << n)) % N) == kgrp && (dontAsk == null || theView.members[(myBase + (1 << n)) % N] != dontAsk))
{
whoToAsk = theView.members[(myBase + (1 << n)) % N];
}
}
}
// If no luck, check starting at some random location and just take anyone in the right affinity group
for (int n = 0; whoToAsk == null && n < theView.members.Length; n++)
{
if (this.GetAffinityGroup((startAt + n) % N) == kgrp && (dontAsk == null || theView.members[(startAt + n) % N] != dontAsk))
{
whoToAsk = theView.members[(startAt + n) % N];
}
}
if (whoToAsk == null)
{
throw new VsyncException("DHT can't find anyone to ask about keys mapping to DHT affinity group " + kgrp);
}
List<KT> gk = new List<KT>();
foreach (KT grouped in GroupedKey)
{
gk.Add(grouped);
}
try
{
ILock.NoteThreadState("Wedged(DHT).WaitOne()");
this.Wedged.WaitOne();
ILock.NoteThreadState(null);
if (!this.P2PQuery(whoToAsk, DHTTimeout, Vsync.IM_DHT_GET, gk, EOL, partialResults))
{
// Two cases: He may have done a nullreply, or he may have actually failed.
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (theView.GetRankOf(whoToAsk) == -1)
{
dontAsk = whoToAsk;
}
}
}
finally
{
this.Wedged.Release();
}
}
catch (VsyncAbortReplyException)
{
Vsync.WriteLine("DHTGet: AbortReplyException!");
dontAsk = whoToAsk;
}
}
}
Dictionary<KT, VT> deDuped = new Dictionary<KT, VT>(1000);
List<KeyValuePair<KT, VT>> results = new List<KeyValuePair<KT, VT>>();
foreach (List<KeyValuePair<KT, VT>> kvpl in partialResults)
{
foreach (KeyValuePair<KT, VT> res in kvpl)
{
if (!deDuped.ContainsKey(res.Key))
{
deDuped.Add(res.Key, res.Value);
}
}
}
foreach (KeyValuePair<KT, VT> res in deDuped)
{
results.Add(res);
}
this.VsyncCallDone();
return results;
}
/// <summary>
/// Atomically and with total order, puts a key,value pair into the DHT
/// </summary>
/// <param name="key"></param>
/// <param name="value"></param>
/// <typeparam name="KT"></typeparam>
/// <typeparam name="VT"></typeparam>
public void DHTOrderedPut<KT, VT>(KT key, VT value)
{
if (!this.VsyncCallStart())
{
return;
}
this.OrderedSend(this.GetAgMembers(new List<object> { key }), Vsync.IM_DHT_PUT, new List<KeyValuePair<object, DHTItem>> { new KeyValuePair<object, DHTItem>(key, new DHTItem(value)) });
this.VsyncCallDone();
}
/// <summary>
/// Atomically and with total order, puts a set of key,value pairs into the DHT
/// </summary>
/// <param name="kvps"></param>
/// <typeparam name="KT"></typeparam>
/// <typeparam name="VT"></typeparam>
public void DHTOrderedPut<KT, VT>(List<KeyValuePair<KT, VT>> kvps)
{
if (!this.VsyncCallStart())
{
return;
}
List<KeyValuePair<KT, DHTItem>> newList = new List<KeyValuePair<KT, DHTItem>>();
foreach (KeyValuePair<KT, VT> kvp in kvps)
{
newList.Add(new KeyValuePair<KT, DHTItem>(kvp.Key, new DHTItem(kvp.Value)));
}
if (newList.Count > 0)
{
this.OrderedSend(this.GetAgMembers(kvps.Select(kvp => (object)kvp.Key).ToList()), Vsync.IM_DHT_PUT, newList);
}
this.VsyncCallDone();
}
/// <summary>
/// Atomically and with total order, removes a set of keys from the DHT
/// </summary>
/// <param name="keys"></param>
/// <typeparam name="KT"></typeparam>
public void DHTOrderedRemove<KT>(IEnumerable<KT> keys)
{
if (!this.VsyncCallStart())
{
return;
}
byte[] b0 = new byte[0];
List<KeyValuePair<KT, byte[]>> toSend = new List<KeyValuePair<KT, byte[]>>();
foreach (KT key in keys)
{
toSend.Add(new KeyValuePair<KT, byte[]>(key, b0));
}
if (toSend.Count > 0)
{
this.OrderedSend(this.GetAgMembers(keys.Select(k => (object)k).ToList()), Vsync.IM_DHT_PUT, toSend);
}
this.VsyncCallDone();
}
/// <summary>
/// Removes the value associated with some set of keys. Atomicity is not guaranteed
/// </summary>
/// <param name="keys"></param>
/// <typeparam name="KT"></typeparam>
/// <returns>The value from the (key,value) pair</returns>
/// <remarks>DHT operations are reliable but not totally ordered, hence DHTRemove for a key shouldn't be issued concurrently with DHTPut operations using the identical key.</remarks>
public void DHTRemove<KT>(IEnumerable<KT> keys)
{
if (!this.VsyncCallStart())
{
return;
}
foreach (KT key in keys)
{
this.DHTPut(key, new byte[0]);
}
this.VsyncCallDone();
}
/// <summary>
/// Removes the value associated with the key.
/// </summary>
/// <param name="key"></param>
/// <typeparam name="KT"></typeparam>
/// <returns>The value from the (key,value) pair</returns>
/// <remarks>DHT operations are reliable but not totally ordered, hence DHTRemove for a key shouldn't be issued concurrently with DHTPut operations using the identical key.</remarks>
public void DHTRemove<KT>(KT key)
{
if (!this.VsyncCallStart())
{
return;
}
this.DHTPut(key, new byte[0]);
this.VsyncCallDone();
}
/// <summary>
/// Performs a virtually synchronous DHTGet operation for a set of keys, returning a list of (key,value) pairs, deduplicated by key
/// </summary>
/// <param name="keys"></param>
/// <typeparam name="KT"></typeparam>
/// <typeparam name="VT"></typeparam>
/// <returns></returns>
public List<KeyValuePair<KT, VT>> DHTOrderedGet<KT, VT>(IEnumerable<KT> keys)
{
if (!this.VsyncCallStart())
{
return new List<KeyValuePair<KT, VT>>();
}
List<List<KeyValuePair<KT, VT>>> results = new List<List<KeyValuePair<KT, VT>>>();
List<KT> toSend = new List<KT>();
foreach (KT key in keys)
{
toSend.Add(key);
}
this.OrderedQuery(ALL, DHTTimeout, new QueryKey<KT>(Vsync.my_address, keys, false), Vsync.IM_DHT_GET, toSend, EOL, results);
Dictionary<KT, VT> deDuped = new Dictionary<KT, VT>(1000);
foreach (List<KeyValuePair<KT, VT>> rl in results)
{
foreach (KeyValuePair<KT, VT> res in rl)
{
if (!deDuped.ContainsKey(res.Key))
{
deDuped.Add(res.Key, res.Value);
}
}
}
List<KeyValuePair<KT, VT>> rv = new List<KeyValuePair<KT, VT>>();
foreach (KeyValuePair<KT, VT> res in deDuped)
{
rv.Add(res);
}
this.VsyncCallDone();
return rv;
}
/// <summary>
/// Performs a DHTGet operation for a single key, returning the value or null
/// </summary>
/// <typeparam name="KT"></typeparam>
/// <typeparam name="VT"></typeparam>
/// <param name="key"></param>
/// <returns></returns>
/// <remarks>In this case DHTOrderedGet and DHTGet are actually identical</remarks>
public VT DHTOrderedGet<KT, VT>(KT key)
{
return this.DHTGet<KT, VT>(key);
}
internal static Dictionary<Address, bool[]> hmInfo = new Dictionary<Address, bool[]>(1000);
internal static LockObject hmLock = new LockObject("hmLock");
internal static int log2(int N)
{
int i;
for (i = 0; (1 << i) < N; i++)
{
}
// Rounds down
if ((1 << i) == N)
{
return i;
}
return i - 1;
}
internal static int log2RU(int N)
{
int i;
for (i = 0; (1 << i) < N; i++)
{
}
return i;
}
private static string PHMap(bool[] hm, int nb = int.MaxValue)
{
string bs = " ";
foreach (bool b in hm)
{
if (nb-- > 0)
{
bs += (b ? "+" : "-") + " ";
}
}
return bs;
}
internal class ddh
{
internal Address dest;
internal byte[] data;
internal int hopcnt;
internal ddh(Address d, byte[] da, int h)
{
this.dest = d;
this.data = da;
this.hopcnt = h;
}
}
internal class gsdh
{
internal Address gaddr;
internal Address sender;
internal byte[] data;
internal int hopcnt;
internal gsdh(Address g, Address s, byte[] da, int h)
{
this.gaddr = g;
this.sender = s;
this.data = da;
this.hopcnt = h;
}
}
internal static LockObject TunnelThreadsLock = new LockObject("TunnelThreadsLock");
internal static Thread IPMCTunnelThread;
internal static Thread UDPTunnelThread;
internal static List<ddh> ddhList = new List<ddh>();
internal static LockObject ddhLock = new LockObject("ddhLock", ThreadPriority.AboveNormal);
internal Semaphore ddhNotEmpty = new Semaphore(0, int.MaxValue);
internal static List<gsdh> gsdhList = new List<gsdh>();
internal static LockObject gsdhLock = new LockObject("gsdhLock");
internal Semaphore gsdhNotEmpty = new Semaphore(0, int.MaxValue);
internal static int nInUDPTunnel()
{
using (var tmpLockObj = new LockAndElevate(ddhLock))
{
return ddhList.Count;
}
}
internal static int nInIPMCTunnel()
{
using (var tmpLockObj = new LockAndElevate(gsdhLock))
{
return gsdhList.Count;
}
}
internal class IPMCVinfo
{
internal int IMVid;
internal Address gaddr;
internal Address sender;
internal View v;
internal IPMCVinfo(int vid, Address ga, Address s, View theView)
{
this.IMVid = vid;
this.gaddr = ga;
this.sender = s;
this.v = theView;
}
}
// These need to spawn threads because they can't risk blocking on the looped-back deliveries via the RECVB bounded buffer
internal void SetupIMTunnels()
{
this.doRegister(Vsync.BECLIENT, new Action<Address>(this.becomeClientOf));
this.doRegister(Vsync.IM_UDP_TUNNEL, new Action<Address, byte[], int>((dest, data, hopcnt) =>
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("UDP Tunnel event handler: got a new UDP packet tunnelling to dest=" + dest + ", data.Length=" + data.Length + ", hopcount is " + hopcnt);
}
using (var tmpLockObj = new LockAndElevate(ddhLock))
{
ddhList.Add(new ddh(dest, data, hopcnt));
this.ddhNotEmpty.Release();
}
using (var tmpLockObj = new LockAndElevate(TunnelThreadsLock))
{
if (UDPTunnelThread == null)
{
UDPTunnelThread = new Thread(() =>
{
try
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[6]++;
this.ddhNotEmpty.WaitOne(1000);
ddh dh;
using (var tmpLockObj1 = new LockAndElevate(ddhLock))
{
dh = ddhList.FirstOrDefault();
if (dh != null)
{
ddhList.Remove(dh);
}
}
if (dh != null)
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("UDP Tunnel event handler calling UDPTunnel: dest=" + dh.dest + ", data.Length=" + dh.data.Length);
}
this.UDPTunnel(dh.dest, dh.data, dh.hopcnt + 1);
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "IM_UDP_TUNNEL thread", Priority = ThreadPriority.AboveNormal, IsBackground = true };
UDPTunnelThread.Start();
}
}
}));
this.doRegister(Vsync.IM_IPMC_TUNNEL, new Action<Address, Address, byte[], int>((gaddr, sender, data, hopcnt) =>
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel[ql=" + gsdhList.Count + "] got a new incoming request: this.gaddr=" + this.gaddr + ", dest gaddr=" + gaddr + ", sender=" + sender + ", data length=" + data.Length + ", hopcnt=" + hopcnt);
}
using (var tmpLockObj = new LockAndElevate(gsdhLock))
{
gsdhList.Add(new gsdh(gaddr, sender, data, hopcnt));
}
this.gsdhNotEmpty.Release();
using (var tmpLockObj = new LockAndElevate(TunnelThreadsLock))
{
if (IPMCTunnelThread == null)
{
IPMCTunnelThread = new Thread(() =>
{
try
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[7]++;
this.gsdhNotEmpty.WaitOne(1000);
gsdh gdh;
using (var tmpLockObj1 = new LockAndElevate(gsdhLock))
{
gdh = gsdhList.FirstOrDefault();
}
if (gdh != null)
{
gsdhList.Remove(gdh);
int ag = this.GetAffinityGrpIdx(gdh.gaddr);
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel[ql=" + gsdhList.Count + "] event handler: this.gaddr=" + this.gaddr + ", dest gaddr=" + gdh.gaddr + ((ag == -1) ? string.Empty : ("(affinity group " + ag + ")")) + ", sender=" + gdh.sender + ", data.Length=" + gdh.data.Length);
}
IPMCNextHops(gdh.gaddr, gdh.sender, gdh.hopcnt, (nextHop, nToScan) =>
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel event handler: nextHop will be " + nextHop + " for gaddr=" + ((ag == -1) ? gdh.gaddr.ToString() : ("(affinity group " + ag + ")")) + ", sender=" + gdh.sender + ", data.Length=" + gdh.data.Length);
}
Vsync.VSYNCMEMBERS.doP2PSend(nextHop, true, Vsync.IM_IPMC_TUNNEL, gdh.gaddr, gdh.sender, gdh.data, nToScan);
});
if ((ag == -1 && Group.doLookup(gdh.gaddr) != null) || (ag >= 0 && Vsync.my_address.GetHashCode() % this.myDHTnShards == ag))
{
IPMCTunnelLoopback(gdh.gaddr, gdh.data);
}
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "IM_IPMC_TUNNEL thread", Priority = ThreadPriority.AboveNormal, IsBackground = true };
IPMCTunnelThread.Start();
}
}
}));
BoundedBuffer IPMCbb = new BoundedBuffer("IPMC:BB", 256, ILock.LLIPMC, 0, 1);
new Thread(() =>
{
try
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[8]++;
IPMCVinfo vi = (IPMCVinfo)IPMCbb.get();
if (vi == null)
{
break;
}
IPMCViewCast(vi.IMVid, vi.gaddr, vi.sender, vi.v);
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "IM_IPMCViewCast_TUNNEL thread", IsBackground = true }.Start();
this.doRegister(Vsync.IM_IPMC_TUNNEL, new Action<int, Address, Address, int, View>((IMVid, gaddr, sender, hopcnt, v) =>
{
if (Vsync.VSYNCMEMBERS.HasFirstView && Vsync.VSYNCMEMBERS.theView.viewid == IMVid)
{
IPMCbb.put(new IPMCVinfo(IMVid, gaddr, sender, v));
}
else if (!Vsync.VSYNCMEMBERS.HasFirstView || Vsync.VSYNCMEMBERS.theView.viewid < IMVid)
{
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.ViewLock))
{
Vsync.VSYNCMEMBERS.stashedIPMCviews.Add(new IPMCVinfo(IMVid, gaddr, sender, v));
}
}
IPMCNextHops(gaddr, sender, hopcnt, (nextHop, nToScan) => Vsync.VSYNCMEMBERS.doP2PSend(nextHop, true, Vsync.IM_IPMC_TUNNEL, gaddr, sender, nToScan, v));
}));
this.doRegister(Vsync.IM_IPMC_VIEWS, new Action<View[]>(vs =>
{
foreach (View v in vs)
{
IPMCNewView(v.gaddr, v);
}
}));
}
internal void replayStashedVinfo()
{
List<IPMCVinfo> newstash = new List<IPMCVinfo>(), toReplay = new List<IPMCVinfo>();
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.ViewLock))
{
View IMv = Vsync.VSYNCMEMBERS.theView;
foreach (IPMCVinfo vi in this.stashedIPMCviews)
{
if (vi.IMVid > IMv.viewid)
{
newstash.Add(vi);
}
else if (vi.IMVid == IMv.viewid)
{
toReplay.Add(vi);
}
}
this.stashedIPMCviews = newstash;
}
if (toReplay.Count > 0)
{
new Thread(() =>
{
try
{
foreach (IPMCVinfo vi in toReplay)
{
IPMCViewCast(vi.IMVid, vi.gaddr, vi.sender, vi.v);
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "ReplayStashedVinfo", IsBackground = true }.Start();
}
}
// Just sends directly
internal void UDPTunnel(Address dest, byte[] data, int hopcnt)
{
Vsync.VSYNCMEMBERS.UDPTunnel(dest, 0, data, hopcnt);
}
internal class ShortCutInfo
{
internal Address who;
internal Address gaddr;
internal ShortCutInfo(Address w, Address g)
{
this.who = w;
this.gaddr = g;
}
}
internal static List<ShortCutInfo> AllShortCuts = new List<ShortCutInfo>();
internal static LockObject ShortCutsLock = new LockObject("ShortCutsLock", ThreadPriority.Highest);
internal Dictionary<Address, bool> ShortCuts = new Dictionary<Address, bool>(1000);
internal void UpdateShortCuts(View v)
{
List<ShortCutInfo> scl = new List<ShortCutInfo>();
Group g = Group.doLookup(v.gaddr);
if (g == null)
{
return;
}
if (g == Vsync.ORACLE && Vsync.ClientOf == null)
{
foreach (Address a in g.theView.members)
{
if (!a.isMyAddress())
{
scl.Add(new ShortCutInfo(a, g.gaddr));
}
}
}
else
{
tokenInfo theToken;
using (var tmpLockObj = new LockAndElevate(g.TokenLock))
{
theToken = g.theToken;
}
if ((g.flags & Group.G_ISLARGE) != 0 && theToken != null)
{
for (int level = 0; level < theToken.nlevels; level++)
{
if (theToken.next[level] != null && !theToken.next[level].isMyAddress())
{
scl.Add(new ShortCutInfo(theToken.next[level], v.gaddr));
}
if (theToken.last[level] != null && !theToken.last[level].isMyAddress())
{
scl.Add(new ShortCutInfo(theToken.last[level], v.gaddr));
}
}
}
}
using (var tmpLockObj = new LockAndElevate(ShortCutsLock))
{
this.ShortCuts = new Dictionary<Address, bool>(1000);
foreach (ShortCutInfo sci in scl)
{
if (!this.ShortCuts.ContainsKey(sci.who))
{
this.ShortCuts.Add(sci.who, true);
}
}
foreach (ShortCutInfo sci in AllShortCuts)
{
if (sci.gaddr != v.gaddr)
{
scl.Add(sci);
}
}
AllShortCuts = scl;
}
}
// Sends but "jiggers" the route for retransmission attempts
internal void UDPTunnel(Address dest, int jigger, byte[] data, int hopcnt)
{
if (hopcnt > 10)
{
if ((VsyncSystem.Debug & VsyncSystem.DISCARDS) != 0)
{
Vsync.WriteLine("WARNING: Discarding a packet in UDPTunnel: hopcnt>10");
}
using (var tmpLockObj = new LockAndElevate(VsyncSystem.RTS.Lock))
{
++VsyncSystem.RTS.Discarded;
}
return;
}
View v = null;
if (!Vsync.VSYNC_LARGE)
{
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.ViewLock))
{
v = Vsync.VSYNCMEMBERS.theView;
}
}
else
{
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.TokenLock))
{
if (Vsync.VSYNCMEMBERS.theToken != null)
{
v = Vsync.VSYNCMEMBERS.theToken.WorkingView;
}
}
}
if (v == null)
{
if ((VsyncSystem.Debug & VsyncSystem.DISCARDS) != 0)
{
Vsync.WriteLine("WARNING: Discarding a packet in UDPTunnel: the working view is null");
}
using (var tmpLockObj = new LockAndElevate(VsyncSystem.RTS.Lock))
{
++VsyncSystem.RTS.Discarded;
}
return;
}
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("UDP Tunnel sender method: dest=" + dest + ", jigger=" + jigger + " data length=" + data.Length);
}
if (dest.isMyAddress())
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("UDP Tunnel: I'm the destination! loop back");
}
IPMCTunnelLoopback(this.gaddr, data);
return;
}
int mybaseRank = v.GetMyRank();
int destRank = v.GetRankOf(dest);
int N = v.members.Length;
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("MybaseRank=" + mybaseRank + ", destRank=" + destRank + ", N=" + N);
}
// This next line is the core of the UDP emulation layer, and what we do is to forward the packet
// in hops. Start by assuming jigger is always 0.
// For example to reach the member with rank 5 from a sender who happens to have rank 0 in a group of size 9,
// this code computes log2((9-0+5) mod 9), which is log2(5), hence 2. Call this k. The code then
// then forwards it to member (1<<k), which is member 4.
// Member 4 receives it, and it computes log2((9-4+5)%9), which is log2(1), which is k=0. So member 4
// forwards it member 5.
// Now think about the same scenario but sending from 5 back to 0, same group of size 9
// First we compute (9-5+0)%9, and we end up with 4.
// Now we compute log2(4)=2, then 1<<2, which is 4 again.
// Add 5 back in and we have 9, mod 9, giving 0. So we forward directly from member 5 to member 0.
// This has the effect of maintaining log(N) links from each node in VSYNCMEMBERS to the nodes at offsets N/2, N/4, N/8, etc to the "right",
// wrapping as we get to N. So we need log(N) TCP links and can reach all N members in log(N) hops, worst case
// The "jigger" parameter is used when retransmitting UDP packets to avoid having a single failure "disconnect" the whole subtree of nodes below
// the failed node. When retransmitting, this forces us to use the "wrong" first hop if non-zero. By virtue of that the second and subsequent messages
// still follow an optimal route, but it isn't always the same route. To do this we just change which of the log(N) outgoing links we use for the
// first hop, unless the first hop would have taken us directly to the target node.
// So... the "trueTargetOffset" looks at how far the actual destination is to the "right" of me in the view
int trueTargetOffset = (N - mybaseRank + destRank) % N;
// The base2 log of the offset rounds down and determines which TCP overlay link we'll use: N/2, N/4, ... 0
int l2OfTrueOffset = log2(trueTargetOffset);
// That TCP overlay link takes us to some node, call it the "selected" target. Compute it's offset to the right of me
int selectedTargetOffset = 1 << l2OfTrueOffset;
// If we're retransmitting and the target isn't the next hop...
if (jigger != 0 && selectedTargetOffset != trueTargetOffset)
{
// Carefully apply the jigger, making sure we don't accidently exceed log2(N)
selectedTargetOffset = 1 << ((l2OfTrueOffset + jigger) % (log2(N) + 1));
}
// Now apply the selected target offset to my own base rank, mod N, and that gives the node to which we'll send the UDP packet
bool isShortCut;
using (var tmpLockObj = new LockAndElevate(ShortCutsLock))
{
this.ShortCuts.TryGetValue(dest, out isShortCut);
}
if (mybaseRank == -1 || destRank == -1 || trueTargetOffset == (1 << l2OfTrueOffset) || isShortCut)
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("UDPTunnel... ReliableSender.P2PSend: dest=" + dest + ", mybaseRank=" + mybaseRank + ", destRank=" + destRank + ", using direct TCPSendTo");
}
ReliableSender.P2PSend(dest, dest.p2pPort, data, ReliableSender.RECVBB);
return;
}
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("UDPTunnel.... ReliableSender.doP2PSend: dest= " + dest + "... computed target " + v.members[(mybaseRank + selectedTargetOffset) % N] + ", using mybaseRank=" + mybaseRank + ", destRank=" + destRank + ", N=" + N + ", trueTargetOffset=" + trueTargetOffset + ", l2ofTrueTargetOffset=" + l2OfTrueOffset + ", selectedTargetOffset=" + selectedTargetOffset);
}
Address target = v.members[(mybaseRank + selectedTargetOffset) % N];
this.doP2PSend(target, true, Vsync.IM_UDP_TUNNEL, dest, data, hopcnt);
}
internal void IPMCTunnel(Address gaddr, byte[] data)
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel sender method: this.gaddr=" + this.gaddr + ", argument gaddr=" + gaddr + ", data.Length=" + data.Length);
}
if (Group.doLookup(gaddr) != null)
{
IPMCTunnelLoopback(gaddr, data);
}
IPMCNextHops(gaddr, Vsync.my_address, int.MaxValue, (nextHop, nToScan) =>
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel next hop will be via " + nextHop + " for message to gaddr=" + gaddr + ", data.Length=" + data.Length);
}
Vsync.VSYNCMEMBERS.doP2PSend(nextHop, true, Vsync.IM_IPMC_TUNNEL, gaddr, Vsync.my_address, data, nToScan);
});
}
private static void IPMCTunnelLoopback(Address gaddr, byte[] data)
{
object[] obs = Msg.BArrayToObjects(data);
Group g = Group.Lookup(gaddr);
g.doP2PSend(Vsync.my_address, true, (int)obs[0], obs[1]);
}
internal static void IPMCViewCast(int IMVid, Address gaddr, Address sender, View v)
{
View IMv = null;
if (!Vsync.VSYNC_LARGE)
{
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.ViewLock))
{
IMv = Vsync.VSYNCMEMBERS.theView;
}
}
else
{
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.TokenLock))
{
if (Vsync.VSYNCMEMBERS.theToken != null)
{
IMv = Vsync.VSYNCMEMBERS.theToken.WorkingView;
}
}
}
if (IMv == null || IMv.viewid != IMVid)
{
return;
}
if (gaddr == Vsync.VSYNCMEMBERS.gaddr)
{
// When membership changes, we need to reset all the forwarding tables because the algorithm is dependent on the view of VSYNCMEMBERS
if (Vsync.ClientOf == null && Vsync.ORACLE.HasFirstView && Vsync.ORACLE.theView.GetMyRank() == 0)
{
List<View> theViews = new List<View>();
using (var tmpLockObj = new LockAndElevate(Group.TPGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in Group.TPGroups)
{
Group tpg = kvp.Value;
if (tpg.theView != null && tpg.gaddr != Vsync.VSYNCMEMBERS.gaddr)
{
theViews.Add(tpg.theView);
}
}
}
Vsync.VSYNCMEMBERS.doSend(false, false, Vsync.IM_IPMC_VIEWS, theViews.ToArray());
}
Vsync.VSYNCMEMBERS.replayStashedVinfo();
return;
}
// This case is used when some single group has a membership change. Logic requires that these P2P messages be delivered after the IM_IPMC_VIEWS multicast
// associated with the most recent VSYNCMEMBERS membership update.
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMCViewCast[" + Vsync.my_address + "]: gaddr=" + gaddr + ", sender=" + sender + ", View=" + v);
}
IPMCNewView(gaddr, v);
IPMCNextHops(gaddr, sender, int.MaxValue, (nextHop, nToScan) =>
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMCViewCast[" + Vsync.my_address + "] next hop will be via " + nextHop + " for gaddr=" + gaddr + ", sender=" + sender + ", View=" + v);
}
if (nextHop == sender)
{
throw new VsyncException("IPMCNextHops: trying to forward back to the originator of a multicast!");
}
Vsync.VSYNCMEMBERS.doP2PSend(nextHop, true, Vsync.IM_IPMC_TUNNEL, IMv.viewid, gaddr, sender, nToScan, v);
});
}
// This guy does a callback to the designated action routine for each next hop the data should take
internal static void IPMCNextHops(Address gaddr, Address sender, int nToScan, Action<Address, int> action)
{
if (!Vsync.VSYNCMEMBERS.HasFirstView)
{
if ((VsyncSystem.Debug & VsyncSystem.DISCARDS) != 0)
{
Vsync.WriteLine("DISCARD (no first view): IPMCNextHops: gaddr=" + gaddr + ", sender=" + sender + ", nToScan=" + nToScan);
}
return;
}
View v = null;
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.TokenLock))
{
if (Vsync.VSYNCMEMBERS.theToken != null)
{
v = Vsync.VSYNCMEMBERS.theToken.WorkingView;
}
}
if (v == null)
{
using (var tmpLockObj = new LockAndElevate(Vsync.VSYNCMEMBERS.ViewLock))
{
v = Vsync.VSYNCMEMBERS.theView;
}
}
int myBase = v.GetMyRank();
int sbase = v.GetRankOf(sender);
int N = v.members.Length;
int ag = Vsync.VSYNCMEMBERS.GetAffinityGrpIdx(gaddr);
if (sbase <= myBase)
{
sbase += N;
}
bool[] hm = null;
if (gaddr == Vsync.VSYNCMEMBERS.gaddr)
{
hm = new bool[log2(v.members.Length) + 1];
// With each hop deeper into the tree, halve the link span that we'll look at
if (nToScan > hm.Length)
{
nToScan = hm.Length;
}
for (int b = 0; b < nToScan; b++)
{
hm[b] = true;
}
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel: dest is VSYNCMEMBERS, setting hm = {" + PHMap(hm) + "}");
}
}
else if (ag >= 0)
{
hm = new bool[Vsync.VSYNCMEMBERS.myDHTnShards];
if (nToScan > hm.Length)
{
nToScan = hm.Length;
}
for (int b = 0; b < nToScan; b++)
{
hm[b] = Vsync.VSYNCMEMBERS.DHTHMaps[ag, b];
}
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel: is an artificial DHT affinity subgroup with hm = {" + PHMap(hm) + "}");
}
}
else
{
bool[] hmm;
using (var tmpLockObj = new LockAndElevate(hmLock))
{
hmInfo.TryGetValue(gaddr, out hmm);
}
if (hmm != null)
{
hm = new bool[hmm.Length];
if (nToScan > hm.Length)
{
nToScan = hm.Length;
}
for (int b = 0; b < nToScan; b++)
{
hm[b] = hmm[b];
}
}
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel: is an artificial DHT affinity subgroup of an application group, hencing using hm = {" + PHMap(hm) + "}");
}
}
if (hm == null || myBase == -1)
{
if ((VsyncSystem.Debug & (VsyncSystem.TUNNELING | VsyncSystem.DISCARDS)) != 0)
{
Vsync.WriteLine("WARNING: IPMC tunnel ignoring a packet with gaddr=" + gaddr + ", sender=" + sender + ", pseudoDepth=" + nToScan + " (couldn't find the group, or my own rank)");
}
using (var tmpLockObj = new LockAndElevate(VsyncSystem.RTS.Lock))
{
VsyncSystem.RTS.Discarded++;
}
return;
}
// The idea below is pretty simple, but sometimes simple things are hard to write down. Suppose that my rank is 0 and I want to broadcast.
// I would send a packet on every link I'm allowed to use, namely to the member at 1, 2, ... N/2
// Notice that this would be log2(N) packets sent. But to whom should each receiver forward?
// At the next "depth" in the forwarding tree, the guy I sent to via my first link out shouldn't forward: he doesn't look at "any" links
// The guy I reached via my second link out should forward only on his own first link: he looks at link 0
// The guy I reached via my third link out should forward on on his first two links, etc.
// Next level down repeats this idea
// So we get a kind of tree: shallow on its left, log2(N) deep if we follow its extreme right side to the bottom
// Any given multicast always tracks down the binary tree that we saw "looking down" from its sender (the root)
// Surprisingly, the actual rule used is pretty simple, once you visualize it this way
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
string slist = " ";
for (int i = 1; i < v.members.Length; i <<= 1)
{
slist += v.members[(myBase + i) % N];
}
Vsync.WriteLine("IPMC Tunnel: gaddr=" + gaddr + ", sender=" + sender + ", myBase=" + myBase + ", sbase=" + sbase + ", nToScan=" + nToScan + ", v.members=" + Address.VectorToString(v.members) + ", hm = {" + PHMap(hm) + "} in skiplist= {" + slist + "}");
}
for (int k = 0; k < hm.Length; k++)
{
if (hm[k])
{
if ((myBase + (1 << k)) < sbase)
{
if ((VsyncSystem.Debug & VsyncSystem.TUNNELING) != 0)
{
Vsync.WriteLine("IPMC Tunnel: Sending a copy to v.members[" + (1 << k) + "]=" + v.members[(myBase + (1 << k)) % N]);
}
action(v.members[(myBase + (1 << k)) % N], k);
}
}
}
}
internal static LockObject IPMCNewViewLock = new LockObject("IPMCNewViewLock");
internal static View TunnelView;
internal static void IPMCNewView(Address gaddr, View v)
{
using (var tmpLockObj = new LockAndElevate(IPMCNewViewLock))
{
if (gaddr == Vsync.VSYNCMEMBERS.gaddr)
{
if (TunnelView != null && v.viewid <= TunnelView.viewid)
{
return;
}
TunnelView = v;
// VSYNCMEMBERS view changed: recompute everything
Dictionary<Address, bool[]> oldhmInfo;
using (var tmpLockObj1 = new LockAndElevate(hmLock))
{
hmInfo = new Dictionary<Address, bool[]>(1000);
oldhmInfo = hmInfo;
}
doIPMCNewView(gaddr, v);
foreach (KeyValuePair<Address, bool[]> kvp in oldhmInfo)
{
if (kvp.Key == gaddr)
{
continue;
}
Group g = Group.Lookup(kvp.Key) ?? TrackingProxyLookup(kvp.Key);
if (g != null && g.HasFirstView)
{
doIPMCNewView(kvp.Key, g.theView);
}
}
}
else
{
doIPMCNewView(gaddr, v);
}
}
}
// This housekeeping method updates the skiplist: true if I have any members down the link
// from me to the guy at offset 2^k to my right, for k=0...log2(N), false if not
internal static void doIPMCNewView(Address gaddr, View v)
{
int myBase = v.GetMyRank();
int N = v.members.Length;
if (myBase == -1)
{
using (var tmpLockObj = new LockAndElevate(hmLock))
{
if (hmInfo.ContainsKey(gaddr))
{
hmInfo.Remove(gaddr);
}
}
return;
}
bool[] hasMembers = new bool[log2(N) + 1];
foreach (Address m in v.members)
{
if (!m.isMyAddress())
{
for (int rankOf = 0; rankOf < N; rankOf++)
{
if (m == v.members[(rankOf + myBase) % N])
{
hasMembers[log2(rankOf)] = true;
break;
}
}
}
}
using (var tmpLockObj = new LockAndElevate(hmLock))
{
// Safely delete the old map if it had one, then remember this new mapping
if (hmInfo.ContainsKey(gaddr))
{
hmInfo.Remove(gaddr);
}
hmInfo.Add(gaddr, hasMembers);
}
}
internal void becomeClientOf(Address who)
{
if (Vsync.ClientOf == null && Vsync.ORACLE.theView.GetMyRank() != -1)
{
// The multicast reaches all system members, but (obviously) ORACLE members ignore it
return;
}
if (Vsync.ClientOf != null && Vsync.ClientOf != who)
{
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
if (!Vsync.RIPList.Contains(Vsync.ClientOf))
{
Vsync.RIPList.Add(Vsync.ClientOf);
}
}
}
Vsync.ClientOf = who;
Vsync.OracleFailedAt = 0;
Vsync.ORACLE.theView.members[0] = who;
Group g;
if ((g = Group.Lookup(Vsync.ORACLE.gname)) != null)
{
g.GroupClose();
}
ILock.Barrier(ILock.LLWAIT, ILock.LCLIENTOF).BarrierRelease(1);
}
private void sortThenDeliverInOrder()
{
Address[] Senders;
int[] vids, msgids;
this.GenerateOrdering(out Senders, out vids, out msgids, true);
}
private void GenerateOrdering(out Address[] senders, out int[] vids, out int[] msgids, bool flushing)
{
using (var tmpLockObj = new LockAndElevate(this.OutOfOrderQueueLock))
{
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("Vsync.ORDEREDSEND: GenerateOrdering, flushing=" + flushing);
}
if (flushing)
{
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
int cnt = 0;
foreach (Msg m in this.OutOfOrderQueue)
{
if (m.ordered)
{
++cnt;
}
}
if (this.desiredOrderQueue.Count != 0 || cnt > 0)
{
string doq = " ", moq = " ";
foreach (svi sv in this.desiredOrderQueue)
{
doq += sv.sender + "::" + sv.vid + ":" + sv.msgid + " ";
}
foreach (Msg m in this.OutOfOrderQueue)
{
moq += m.sender + "::" + m.vid + ":" + m.msgid + (m.ordered ? "*ordered" : string.Empty) + " ";
}
Vsync.WriteLine("GenerateOrdering: During flush found DesiredOrderQueue = {" + doq + "}, OutOfOrderQueue = { " + moq + "}");
}
}
this.desiredOrderQueue = new List<svi>();
this.OutOfOrderQueue.Sort();
}
int idx = 0;
foreach (Msg m in this.OutOfOrderQueue)
{
if (!flushing && m.ordered)
{
continue;
}
++idx;
}
senders = new Address[idx];
vids = new int[idx];
msgids = new int[idx];
idx = 0;
foreach (Msg m in this.OutOfOrderQueue)
{
if (!flushing && m.ordered)
{
continue;
}
m.ordered = true;
senders[idx] = m.sender;
vids[idx] = m.vid;
msgids[idx] = m.msgid;
++idx;
}
}
if (flushing)
{
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("SortThenDeliverInOrder<" + this.gname + ">: " + this.OrderToString(senders, vids, msgids));
}
this.DeliverInOrder("Vsync.SortThenDeliverInOrder", senders, vids, msgids);
this.desiredOrderQueue = new List<svi>();
using (var tmpLockObj = new LockAndElevate(this.OutOfOrderQueueLock))
{
this.OutOfOrderQueue = new List<Msg>();
this.OutOfOrderQueueCount = 0;
}
}
else if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("GenerateOrdering<" + this.gname + ">: " + this.OrderToString(senders, vids, msgids));
}
}
internal bool onDOQ(Msg m)
{
using (var tmpLockObj = new LockAndElevate(this.OutOfOrderQueueLock))
{
foreach (svi sviQE in this.desiredOrderQueue)
{
if (sviQE.vid == m.vid && sviQE.msgid == m.msgid && sviQE.sender == m.sender)
{
return true;
}
}
return false;
}
}
internal void DeliverInOrder(string cfrom, Address[] senders, int[] vids, int[] msgids)
{
int vid = -1;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
if (this.theView != null)
{
vid = this.theView.viewid;
}
}
using (var tmpLockObj = new LockAndElevate(this.OutOfOrderQueueLock))
{
for (int n = 0; n < senders.Length; n++)
{
if (vids[n] < vid)
{
continue;
}
foreach (svi sviQE in this.desiredOrderQueue)
{
if (sviQE.vid == vids[n] && sviQE.msgid == msgids[n] && sviQE.sender == senders[n])
{
throw new VsyncException("OrderedSend: multiple orderings for " + senders[n] + "::" + vids[n] + ":" + msgids[n]);
}
}
this.desiredOrderQueue.Add(new svi(senders[n], vids[n], msgids[n]));
}
List<Msg> toDeliver = new List<Msg>();
svi sv;
while ((sv = this.desiredOrderQueue.FirstOrDefault()) != null)
{
if (sv.vid < vid)
{
this.desiredOrderQueue.RemoveAt(0);
continue;
}
bool fnd = false;
List<Msg> newOOQ = new List<Msg>();
foreach (Msg m in this.OutOfOrderQueue)
{
if (m.vid == sv.vid && m.msgid == sv.msgid && m.sender == sv.sender)
{
this.desiredOrderQueue.RemoveAt(0);
if (fnd)
{
throw new VsyncException("DeliverInOrder<" + cfrom + ">: Duplicate redelivery of " + m.sender + "::" + m.vid + ":" + m.msgid);
}
fnd = true;
m.type = Msg.REDELIVERY;
toDeliver.Add(m);
}
else
{
newOOQ.Add(m);
}
}
this.OutOfOrderQueue = newOOQ;
this.OutOfOrderQueueCount = this.OutOfOrderQueue.Count;
if (!fnd)
{
break;
}
}
if (toDeliver.Count > 0)
{
this.incomingSends.putFront(toDeliver);
}
}
}
internal void GotLastSeqns(int vid, int[] LastSeqns)
{
if (this == Vsync.ORACLE || this.PendingQueue == null || (this.flags & G_ISLARGE) != 0)
{
// Before first view is known, or in ORACLE (which has a special structure), or in large groups, which implement view synchrony in a different way
return;
}
using (var tmpLockObj = new LockAndElevate(this.PendingQueueLock))
{
if (this.theView.viewid != vid)
{
throw new VsyncException("GotLastSeqns: theView.vid=" + this.theView.viewid + ", but GotLastSeqns has vid=" + vid);
}
int myRank = this.theView.GetMyRank();
if (myRank == -1)
{
return;
}
this.theView.LastMsg = LastSeqns;
for (int i = 0; i < LastSeqns.Length; i++)
{
if (i == myRank || LastSeqns[i] == this.theView.NextIncomingMsgID[1 + i])
{
this.PendingQueue[1 + i] = null;
}
}
if ((VsyncSystem.Debug & VsyncSystem.LOWLEVELMSGS) != 0)
{
using (var tmpLockObj1 = new LockAndElevate(ReliableSender.ackInfoLock))
{
ReliableSender.ackInfo.Add("[" + Vsync.MsToSecs(Vsync.NOW) + "]: GotLastSeqns: nullify PendingQueue for <" + this.gname + ">" + Environment.NewLine);
}
}
}
}
internal bool IAmLeader()
{
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (!this.HasFirstView || theView.hasFailed == null)
{
return false;
}
return theView.IAmLeader();
}
internal bool IAmRank0()
{
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
return theView.IAmRank0();
}
internal int nLive()
{
int n = 0;
foreach (bool b in this.theView.hasFailed)
{
if (!b)
{
++n;
}
}
return n;
}
/// <exclude>
/// <summary>
/// Internal, used to ensure that lists of Groups can be sorted
/// </summary>
/// <returns>A hashcode for this group</returns>
/// </exclude>
public override int GetHashCode()
{
return this.gaddr.GetHashCode();
}
/// <summary>
/// Looks up a group by Address
/// </summary>
/// <param name="gaddr">An Address for the desired group</param>
/// <returns>a Group object, or null if not found</returns>
public static Group Lookup(Address gaddr)
{
if (gaddr == null || Vsync.ORACLE == null || Vsync.VSYNCMEMBERS == null || gaddr == Vsync.ORACLE.gaddr || gaddr == Vsync.VSYNCMEMBERS.gaddr)
{
return null;
}
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
Group g;
if (VsyncGroups.TryGetValue(gaddr, out g))
{
return g;
}
}
return null;
}
internal static Group Lookup(int vaddr)
{
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
if (kvp.Value.myVirtIPAddr == vaddr)
{
return kvp.Value;
}
}
}
return null;
}
internal static Group doLookup(Address gaddr)
{
if (gaddr == null)
{
return null;
}
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
Group g;
if (VsyncGroups.TryGetValue(gaddr, out g))
{
return g;
}
}
if (Vsync.ORACLE != null && gaddr == Vsync.ORACLE.gaddr)
{
return Vsync.ORACLE;
}
if (Vsync.VSYNCMEMBERS != null && gaddr == Vsync.VSYNCMEMBERS.gaddr)
{
return Vsync.VSYNCMEMBERS;
}
return null;
}
/// <summary>
/// Looks up a group by name
/// </summary>
/// <param name="gname">the group name</param>
/// <returns>a Group object, or null if not found</returns>
public static Group Lookup(string gname)
{
if (gname.Equals("ORACLE", StringComparison.Ordinal) || gname.Equals("VSYNCMEMBERS", StringComparison.Ordinal))
{
return null;
}
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
if (kvp.Value.gname.Equals(gname, StringComparison.Ordinal))
{
return kvp.Value;
}
}
}
return null;
}
internal static Group doLookup(string gname)
{
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
if (kvp.Value.gname.Equals(gname, StringComparison.Ordinal))
{
return kvp.Value;
}
}
}
if (gname.Equals("ORACLE", StringComparison.Ordinal))
{
return Vsync.ORACLE;
}
return null;
}
/// <summary>
/// Gets the current View of a group.
/// </summary>
/// <returns>the current view of the group</returns>
/// <remarks>
/// The View is a central property of the Vsync model. Learn more about it by reading about
/// <it>virtual synchrony</it>. With the View you can build very sophisticated workload sharing and fault-tolerance mechanisms
/// easily and with confidence in their correctness.
/// </remarks>
public View GetView()
{
return this.theView;
}
/// <summary>
/// Gets the name of a group
/// </summary>
/// <returns>the group name</returns>
public string GetName()
{
return this.gname;
}
private string PFlags(int flags)
{
if (flags == 0)
{
return string.Empty;
}
string s = string.Empty;
for (int n = 0; (1 << n) <= flags; n++)
{
if ((flags & (1 << n)) != 0)
{
s += flagNames[n] + " ";
}
}
return "{ " + s + "}";
}
/// <exclude>
/// <summary>
/// Internal, pretty-prints the state of a group for debugging purposes
/// </summary>
/// <returns>string representing the group state</returns>
/// </exclude>
public override string ToString()
{
try
{
if (this == Vsync.ORACLE && Vsync.ClientOf != null)
{
return "ORACLE requests: I access the ORACLE remotely as a client of " + Vsync.ClientOf;
}
tokenInfo theToken;
View theView;
int msgid;
using (var tmpLockObj = new LockAndElevate(this.TokenLock))
using (var tmpLockObj1 = new LockAndElevate(this.ViewLock))
{
theToken = this.theToken;
theView = this.theView;
msgid = this.nextMsgid;
}
int f;
string ssNeeded = string.Empty;
using (var tmpLockObj = new LockAndElevate(this.GroupFlagsLock))
{
f = this.flags;
if (this.HasFirstView)
{
string psr = string.Empty;
if (theView.PendingSends.Count > 0)
{
psr = ", " + theView.PendingSends.Count + " PendingSend records (minMsgId=" + theView.PendingSends.First().msgid + ")";
}
ssNeeded = ", " + (theView.minStable > theView.lastStabilitySent ? "Need" : "Don't need") + " to send stability info (last sent at " + Vsync.MsToSecs(this.SentStableAt) + "; MaxBacklogSent=" + this.CurrentBacklog + psr + ")";
}
}
string repAddr = "[VirtIP: " + MCMDSocket.PMCAddr(this.myVirtIPAddr) + ", PhysIP: " + MCMDSocket.PMCAddr(this.myPhysIPAddr) + "]";
string s = !this.HasFirstView ? string.Empty : (Environment.NewLine + " " + theView);
s = "group <" + this.gname + (this.where == null ? string.Empty : "(created by " + this.where + ")") + ">... gaddr " + this.gaddr + this.PFlags(f) + ", IP address " + repAddr + s + Environment.NewLine + " My Received multicast count " + this.rcvdMcastsCnt + ", rate " + this.rcvdMcastsRate + ", next outgoing msgid " + msgid + ssNeeded;
if ((this.flags & G_ISLARGE) != 0 && !this.isTrackingProxy)
{
if (theToken != null)
{
s += Environment.NewLine + " " + theToken;
}
else
{
s += Environment.NewLine + " .... Token still uninitialized";
}
s += Environment.NewLine + " Large-group garbage collection has collected messages with (vid:id) in range (*:[0-" + this.gcollectedTo + "])";
}
using (var tmpLockObj = new LockAndElevate(this.groupLock))
using (var tmpLockObj1 = new LockAndElevate(this.curMsgListLock))
{
foreach (KeyValuePair<Thread, Msg> kvp in this.curMsgList)
{
s += Environment.NewLine + " Thread " + (kvp.Key.Name ?? "(unnamed)") + " is currently busy delivering " + kvp.Value;
}
}
if (this.incomingSends != null)
{
using (var tmpLockObj = new LockAndElevate(this.incomingSends.Lock))
{
if (this.incomingSends.fullSlots > 0)
{
s += Environment.NewLine + " A user-level delay during multicast delivery has allowed multicasts to enqueue (count=" + this.incomingSends.fullSlots + "):";
for (int i = 0; i < this.incomingSends.fullSlots; i++)
{
object o = this.incomingSends.theBuffer[(this.incomingSends.gNext + i) % this.incomingSends.size];
if (o == null)
{
s += Environment.NewLine + " (null entry: EOF)";
}
else if (o.GetType() == typeof(Msg))
{
s += Environment.NewLine + " Incoming multicast ready for delivery: " + o;
}
else if (o.GetType() == typeof(View))
{
s += Environment.NewLine + " " + o;
}
}
}
}
}
if (this.incomingP2P != null)
{
using (var tmpLockObj = new LockAndElevate(this.incomingP2P.Lock))
{
if (this.incomingP2P.fullSlots > 0)
{
s += Environment.NewLine + " A user-level delay during p2p delivery has allowed p2p messages to enqueue (count=" + this.incomingP2P.fullSlots + "):";
for (int i = 0; i < this.incomingP2P.fullSlots; i++)
{
object o = this.incomingP2P.theBuffer[(this.incomingSends.gNext + i) % this.incomingSends.size];
if (o == null)
{
s += Environment.NewLine + " (null entry: EOF)";
}
else if (o.GetType() == typeof(Msg))
{
s += Environment.NewLine + " Incoming P2P ready for delivery: " + o;
}
else if (o.GetType() == typeof(View))
{
s += Environment.NewLine + " " + o;
}
}
}
}
}
using (var tmpLockObj = new LockAndElevate(this.ToDoLock))
{
foreach (Msg m in this.ToDo)
{
s += Environment.NewLine + " ToDo: " + m;
}
}
using (var tmpLockObj = new LockAndElevate(this.P2PStashLock))
{
foreach (Msg m in this.P2PStash)
{
s += Environment.NewLine + " Stashed: " + m;
}
}
if (this.PendingQueue != null)
{
using (var tmpLockObj = new LockAndElevate(this.PendingQueueLock))
{
for (int i = 0; i < this.PendingQueue.Length; i++)
{
if (this.PendingQueue[i] == null || this.PendingQueue[i].Count == 0)
{
continue;
}
s += Environment.NewLine + " PendingQueue";
if ((this.flags & G_ISLARGE) == 0)
{
if (i > 0)
{
s += " for sender " + theView.members[i - 1];
}
else
{
s += " from ORACLE";
}
}
s += ":";
foreach (KeyValuePair<long, Msg> kvp in this.PendingQueue[i])
{
s += Environment.NewLine + " " + kvp.Value;
}
}
}
}
using (var tmpLockObj = new LockAndElevate(this.CausalOrderListLock))
{
if (this.CausalOrderList.Count > 0)
{
s += Environment.NewLine + " CausalSend multicasts awaiting delivery (myVT = " + VTtoString(theView.myVT) + "):";
foreach (ctuple ct in this.CausalOrderList)
{
s += Environment.NewLine + " " + VTtoString(ct.theVT) + " " + ct.theMsg;
}
}
}
using (var tmpLockObj = new LockAndElevate(this.OutOfOrderQueueLock))
{
if (this.OutOfOrderQueue.Count > 0)
{
s += Environment.NewLine + " Known ordering information: { ";
foreach (svi sv in this.desiredOrderQueue)
{
s += "[" + sv.sender + "::" + sv.vid + ":" + sv.msgid + "] ";
}
s += "}" + Environment.NewLine + " Totally Ordered Sends Awaiting Order Info (ql=" + this.OutOfOrderQueueCount + "):";
foreach (Msg m in this.OutOfOrderQueue)
{
s += Environment.NewLine + " " + m;
}
}
}
using (var tmpLockObj = new LockAndElevate(this.OrderedSubsetListLock))
{
if (this.OrderedSubsetPQ.Count > 0)
{
s += Environment.NewLine + " Subset multicasts awaiting order information (myTS=" + this.myTS + "):";
foreach (KeyValuePair<osspq, Msg> kvp in this.OrderedSubsetPQ)
{
s += Environment.NewLine + " " + kvp;
}
}
}
using (var tmpLockObj = new LockAndElevate(this.UnstableLock))
{
if (this.Unstable != null && this.Unstable.Count > 0)
{
s += Environment.NewLine + " Unstable multicast messages: {";
foreach (Msg m in this.Unstable)
{
s += " " + m.sender + "[" + m.vid + ":" + m.msgid + "]";
}
s += " }";
}
}
return s;
}
catch (Exception e)
{
return "Unable to print state of group " + (this.gname ?? "(name unknown)") + " because " + e;
}
}
/// <exclude>
/// <summary>
/// Internal, pretty-prints the state of a group for debugging purposes
/// </summary>
/// <returns>string representing the group state</returns>
/// </exclude>
public static string GetState()
{
string state = string.Empty;
List<Group> isClone = new List<Group>();
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
isClone.Add(kvp.Value);
}
}
if (isClone.Count != 0)
{
state += "GROUPS:" + Environment.NewLine;
foreach (Group g in isClone)
{
if (g != Vsync.ORACLE)
{
try
{
state += GetGroupState(g) + " ----------------------------------------------------------------------------------------------------" + Environment.NewLine;
}
catch (Exception)
{
if (g.gname != null)
{
state += "Vsync threw an exception while trying to print the state of group <" + g.gname + ">";
}
}
}
}
}
state += "MEMBERSHIP ORACLE:" + Environment.NewLine + (Vsync.ORACLE == null ? "NO ORACLE" : GetGroupState(Vsync.ORACLE));
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
if (TPGroups.Count != 0)
{
state += " TRACKING PROXIES:" + Environment.NewLine;
foreach (KeyValuePair<Address, Group> kvp in TPGroups)
{
Group tpg = kvp.Value;
try
{
state += " " + Group.TPtoString(tpg) + Environment.NewLine;
}
catch (Exception)
{
if (tpg.gname != null)
{
state += "Vsync threw an exception while trying to print the state of group <" + tpg.gname + ">";
}
}
}
}
}
if (Vsync.Proposed != null && Vsync.Proposed.Length > 0)
{
state += " PROPOSED VIEW DELTAS:" + Environment.NewLine;
foreach (Vsync.ViewDelta vd in Vsync.Proposed)
{
state += " " + vd + Environment.NewLine;
}
}
state += vgGetState() + Environment.NewLine;
state += dumpStash();
return state;
}
private static string GetGroupState(Group g)
{
string s = " " + (g.GroupOpen ? string.Empty : "** CLOSED **") + g + Environment.NewLine;
if ((g.flags & G_ISLARGE) != 0)
{
s += g.LGRelayGetState();
}
else
{
s += g.AggState();
}
s += g.GetLockState();
if ((g.flags & G_USESOOB) != 0)
{
s += OOBState(g, false);
}
return s;
}
/// <summary>
/// Associate a name with a group.
/// </summary>
/// <param name="gname">the group name</param>
/// <remarks>
/// Rarely used, this method associates a group name with a group. For a durable group the name is typically a file name
/// in the global file system and will be the file in which group keys and checkpoint data is saved by Vsync. Access to the
/// file plays a central role in the Vsync protection scheme: users who can't access that file can't access the group key
/// and hence can't see the data if the group is a secured one. In fact Vsync has no admission control policies of its own:
/// it just rides along on the file system security architecture in this sense.
/// </remarks>
public void Bind(string gname)
{
if (this.GroupOpen)
{
return;
}
this.gname = gname;
}
internal void SetMap(string where, int[] mm)
{
this.myVirtIPAddr = mm[MCMDSocket.VIRTUAL];
this.myPhysIPAddr = mm[MCMDSocket.PHYSICAL];
MCMDSocket.SetMap(where, this.gname, false, mm);
}
internal void NewView(View v, string calledFrom, int[] curMap)
{
View nv = null;
this.NewView(v, calledFrom, curMap, ref nv);
}
internal void NewView(View v, string calledFrom, int[] curMap, ref View nv)
{
bool hadFirstView = this.HasFirstView;
if ((VsyncSystem.Debug & (VsyncSystem.MSGIDS | VsyncSystem.VIEWCHANGE | VsyncSystem.STARTSEQ)) != 0)
{
Vsync.WriteLine("ENTERING NEWVIEW[caller:" + calledFrom + "]: Group " + this.gname + ", with new view=" + v);
}
if (v.viewid > 0 && v.members.Length > 0 && v.joiners.Contains(v.members[0]))
{
// This can happen if a group loses its (only) current member just as someone is joining...
this.InitializeGroup(v);
}
if (this.isTrackingProxy)
{
// These are used only by the ORACLE to track groups on behalf of their members
// and by clients to track the ORACLE itself
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
this.theView = v;
this.nextMsgid = 0;
}
if (!hadFirstView && v.joiners.Length == 0)
{
v.joiners = new[] { Vsync.my_address };
}
this.HasFirstView = true;
replayStash(this);
List<byte[]> ae = this.IPMCArrivedEarly;
List<ReliableSender.MReplayMe> mae = this.MsgArrivedEarly;
this.IPMCArrivedEarly = null;
this.MsgArrivedEarly = null;
MCMDSocket.drainIPMCArrivedEarly(ae);
this.drainEarlyArrivalMsgQ(mae);
if (v.members.Length == 0)
{
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
TPGroups.Remove(this.gaddr);
}
}
return;
}
if (!this.HasFirstView)
{
if (!this.gname.Equals("ORACLE", StringComparison.Ordinal) && v.GetMyRank() == -1)
{
Vsync.WriteLine("NewView: I'm not actually IN the first view for " + this.gname + Environment.NewLine + "The view is " + v);
return;
}
using (var tmpLockObj = new LockAndElevate(this.ToDoLock))
{
if (this.ToDo.Count > 0)
{
List<Msg> newToDo = new List<Msg>();
foreach (Msg m in this.ToDo)
{
if (m.vid >= v.viewid)
{
newToDo.Add(m);
}
}
this.ToDo = newToDo;
this.ToDoCount = this.ToDo.Count;
}
}
}
if ((this.flags & G_ISLARGE) == 0 || !this.HasFirstView)
{
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
this.nextMsgid = 0;
}
}
if (curMap != null)
{
if (this != Vsync.ORACLE || Vsync.ClientOf == null)
{
this.SetMap("Newview<" + this.gname + ">", curMap);
}
ReliableSender.StartGroupReader(this);
}
this.opqDrain();
if (v.joiners.Length > 0)
{
foreach (Address who in v.joiners)
{
this.WatchEvent(who, W_JOIN);
}
}
if (v.leavers.Length > 0)
{
foreach (Address who in v.leavers)
{
this.WatchEvent(who, W_LEAVE);
}
}
if (this.theView != null && this.theView.viewid >= v.viewid)
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("NEWVIEW: IGNORE view update for Group " + this.gname + ", " + v);
}
return;
}
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("NEWVIEW(" + calledFrom + "): Group " + this.gname + ", " + v);
}
SortedList<long, Msg>[] oldPendingQueue = this.PendingQueue;
int PQlen;
if ((this.flags & G_ISLARGE) == 0)
{
PQlen = v.members.Length + 1;
// In small groups, we flush when changing views, hence Unstable can be discarded. In large groups, the rule is different
using (var tmpLockObj = new LockAndElevate(this.UnstableLock))
{
this.Unstable = new List<Msg>();
this.UnstableCount = 0;
}
}
else
{
PQlen = 2;
}
if ((VsyncSystem.Debug & VsyncSystem.LOWLEVELMSGS) != 0)
{
using (var tmpLockObj = new LockAndElevate(ReliableSender.ackInfoLock))
{
ReliableSender.ackInfo.Add("[" + Vsync.MsToSecs(Vsync.NOW) + "]: newview: reinitialize PendingQueue for <" + this.gname + ">" + Environment.NewLine);
}
}
using (var tmpLockObj = new LockAndElevate(this.PendingQueueLock)) // Synchronized with GotAMsg, which is a real-time procedure that needs to be rather nimble...
using (var tmpLockObj1 = new LockAndElevate(this.ViewLock))
{
this.sortThenDeliverInOrder();
this.finalizePendingSafeSends(v);
if (v.joiners.Length > 0)
{
if (this.theChkptChoser == null && v.IAmLeader())
{
using (var tmpLockObj2 = new LockAndElevate(this.theChkptMakersLock))
{
v.theChkptMakers = this.theChkptMakers;
}
}
else if (this.theChkptChoser != null)
{
foreach (Address who in v.joiners)
{
if (this.theChkptChoser(v, who))
{
using (var tmpLockObj2 = new LockAndElevate(this.theChkptMakersLock))
{
v.theChkptMakers = this.theChkptMakers;
}
break;
}
}
}
}
this.theView = v;
this.nRaw = 0;
this.receivedOrderedSends = false;
for (int i = 0; i < this.myDHTnShards; i++)
{
this.lastVersionId[i] = -1;
}
nv = v;
this.incomingSends.put(v);
SortedList<long, Msg>[] oldPQ = this.PendingQueue;
this.PendingQueue = new SortedList<long, Msg>[PQlen];
this.PendingQueueCount = 0;
for (int i = 0; i < PQlen; i++)
{
this.PendingQueue[i] = new SortedList<long, Msg>();
}
if (oldPQ != null && (VsyncSystem.Debug & VsyncSystem.DISCARDS) != 0)
{
foreach (SortedList<long, Msg> sl in oldPQ)
{
if (sl != null && sl.Count > 0)
{
string ms = " ";
foreach (KeyValuePair<long, Msg> kvp in sl)
{
ms += kvp.Value.sender + "::" + kvp.Value.vid + ":" + kvp.Value.msgid + " ";
}
Vsync.WriteLine("WARNING: Newview<" + this.gname + ">: When installing view " + v.viewid + " found and discarded undelivered retained messages {" + ms + "}");
}
}
}
}
using (var tmpLockObj = new LockAndElevate(this.CausalOrderListLock))
{
this.CausalOrderList = new List<ctuple>();
this.CausalOrderListCount = 0;
}
this.HasFirstView = true;
if ((this.flags & G_ISLARGE) == 0)
{
this.resetSGAggregations();
}
if (oldPendingQueue != null)
{
new Thread(() =>
{
try
{
for (int oldi = 0; oldi < oldPendingQueue.Length && (this.GroupOpen || !this.WasOpen); oldi++)
{
if (oldPendingQueue[oldi] != null)
{
foreach (KeyValuePair<long, Msg> kvp in oldPendingQueue[oldi])
{
if (kvp.Value.vid >= v.viewid)
{
this.GotAMsg(kvp.Value, Msg.MULTICAST, "Newview");
}
else
{
if ((VsyncSystem.Debug & VsyncSystem.DISCARDS) != 0)
{
Vsync.WriteLine("Discarding a duplicate in NewView: message had a stale viewid");
}
using (var tmpLockObj = new LockAndElevate(VsyncSystem.RTS.Lock))
{
++VsyncSystem.RTS.Discarded;
}
}
}
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "replay OldPendingQueue contents in g.NewView()", IsBackground = true }.Start();
}
if (this.durabilityMethod != null)
{
this.durabilityMethod.NewView(v);
if (v.GetMyRank() == 0 && !this.IAmSafeSendLeader)
{
this.durabilityMethod.PlayBack();
}
}
replayStash(this);
this.theView.isFinal = false;
this.GroupOpen = true;
ReliableSender.resetTheToken(this);
this.UpdateShortCuts(v);
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("Releasing LLINITV barrier lock for gaddr " + this.gaddr + "(Lock [" + ILock.LLINITV + "][" + ILock.GetLockId(ILock.LLINITV, this.gaddr.GetHashCode()) + "])");
}
ILock.Barrier(ILock.LLINITV, this.gaddr).BarrierReleaseAll();
if (v.members.Length == 0)
{
this.GroupClose();
}
}
/* Reset aggregation info in a small group */
private void resetSGAggregations()
{
if (!this.HasFirstView)
{
if ((VsyncSystem.Debug & VsyncSystem.AGGREGATION) != 0)
{
Vsync.WriteLine("Reinitialize the AggList<" + this.gname + "> called but HasFirstView=false");
}
return;
}
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
int N = theView.members.Length;
if (N == 0)
{
return;
}
LinkedList<object>[] oldAggList;
using (var tmpLockObj = new LockAndElevate(this.AggListLock))
{
oldAggList = this.AggList;
int nlevels = log2RU(N) + 1;
if ((VsyncSystem.Debug & VsyncSystem.AGGREGATION) != 0)
{
Vsync.WriteLine("Reinitialize the AggList<" + this.gname + "> for VID=" + theView.viewid + "[nlevels=" + nlevels + "] from the AggTypes List in <" + this.gname + ">... (it lists " + this.AggTypes.Count + " types)");
}
this.AggList = new LinkedList<object>[nlevels];
binfo.resetBarrierList();
for (int n = 0; n < nlevels; n++)
{
this.AggList[n] = new LinkedList<object>();
foreach (AggInfo ag in this.AggTypes)
{
if ((VsyncSystem.Debug & VsyncSystem.AGGREGATION) != 0)
{
Vsync.WriteLine("Calling constructor in <" + this.gname + "> to allocate a new aggregator of type " + ag.KVT);
}
// These are actually "aggregator" objects of some derived type
this.AggList[n].AddLast(ag.myFactory.Invoke(new[] { this, theView.viewid, n, ag.theDel, ag.theTimeout }));
}
}
}
if (oldAggList != null)
{
foreach (LinkedList<object> item in oldAggList)
{
foreach (IAggregateEventHandler ae in item)
{
ae.AggEvent(Group.BreakWaits, theView.viewid, null, null, 0);
}
}
}
}
// Takes advantage of the fact that commit has the effect of also making anything in the SS list atomic
// The list is sorted in a standard order and now all members of View v will also have the same list contents
// So, we can safely deliver these messages prior to installing the new view.
private void finalizePendingSafeSends(View v)
{
SortedList<SUTW, Msg> newSSList = new SortedList<SUTW, Msg>();
using (var tmpLockObj = new LockAndElevate(this.SSLock))
{
foreach (KeyValuePair<SUTW, Msg> ss in this.SSList)
{
if (!ss.Key.commitFlag)
{
ss.Key.commitFlag = true;
// The value picked must be larger than anything on the queue but also deterministic: all group members must use the same value for the same message.
// We also prefer to respect the ordering on message id's. This particular TS should do the trick: these messages will move to the end of the queue
// and then be delivered in msgid order, ties broken by sender id
ss.Key.Who = ss.Key.Sender;
ss.Key.TS = int.MaxValue - 10000 + ss.Value.msgid;
if ((VsyncSystem.Debug & VsyncSystem.SAFESEND) != 0)
{
Vsync.WriteLine("SAFEDELIVER[FINALIZE]: " + ss.Value.sender + "::" + ss.Value.vid + ":" + ss.Value.msgid + "... SET ORDER: " + ss.Key.Sender + "::*+" + (ss.Key.TS - (int.MaxValue - 10000)));
}
newSSList.Add(ss.Key, ss.Value);
}
else
{
newSSList.Add(ss.Key, ss.Value);
}
}
this.SSList = newSSList;
if ((VsyncSystem.Debug & VsyncSystem.SAFESEND) != 0)
{
Vsync.WriteLine("SAFESEND[FINALIZE]: Queue after completion and resorting: ");
foreach (KeyValuePair<SUTW, Msg> ss in this.SSList)
{
Vsync.WriteLine(" (UID=" + ss.Key.Sender + "::" + ss.Key.Uid + "): " + ss.Value.sender + "::" + ss.Value.vid + ":" + ss.Value.msgid + "... ORDER: " + ss.Key.Who + "::" + (ss.Key.TS > 10000 ? ss.Key.TS - (int.MaxValue - 10000) : ss.Key.TS) + ")");
}
}
this.deliverSSItems();
}
}
private void drainEarlyArrivalMsgQ(List<ReliableSender.MReplayMe> mae)
{
if (mae == null || mae.Count == 0)
{
return;
}
new Thread(() =>
{
try
{
foreach (ReliableSender.MReplayMe m in mae)
{
ReliableSender.GotIncoming(m.type, m.gaddr, m.sender, m.minStable, m.msg, false);
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "MsgDrain for <" + this.gname + ">", Priority = ThreadPriority.AboveNormal, IsBackground = true }.Start();
}
/// <summary>
/// Event code designating that the watched process has joined the group
/// </summary>
public const int W_JOIN = 0;
/// <summary>
/// Event code designating that the watched process has left the group
/// </summary>
public const int W_LEAVE = 1;
internal void WatchEvent(Address who, int ev)
{
myWatches mwl = this.Watch[who];
if (mwl == null)
{
return;
}
foreach (Watcher w in mwl.hList)
{
w(ev);
}
}
/// <summary>
/// Register a handler for incoming multicasts or queries in the group.
/// </summary>
/// <param name="request">The id for the requests corresponding to this handler</param>
/// <param name="del">A request handler; it will be called for messages with matching id and type signatures</param>
/// <remarks>
/// A process setting up a group uses Register to register a handler for a specific request identifier, which will be a small integer
/// counting up from 0. The delegate is of any type you wish, and there can be multiple handlers for the same request type. When Vsync
/// receives a message in a group it will invoke all handlers for which types exactly match. Unfortunately, Vsync can't do sophisticated
/// type inference at this time, so it expects exact matches: if a class Dog extends Animal, and a handler defines some argument of type
/// Animal, that handler won't be invoked for an incoming message containing a Dog, even though a Dog is an Animal. Sorry!
/// </remarks>
public void RegisterHandler(int request, Delegate del)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
this._RegisterHandler(request, del);
}
internal class ChkptMkr
{
private readonly Group myGroup;
public ChkptMkr(Group g)
{
this.myGroup = g;
}
public static ChkptMkr operator +(ChkptMkr a, ChkptMaker maker)
{
a.myGroup.RegisterMakeChkpt(maker);
return a;
}
public static ChkptMkr operator -(ChkptMkr a, ChkptMaker maker)
{
a.myGroup.UnRegisterMakeChkpt(maker);
return a;
}
}
internal ChkptMkr MakeChkpt;
/// <summary>
/// A process setting up a group uses RegisterMakeChkpt to register a method that will create a checkpoint
/// for storage into a file (when terminating the group) or for state-transfer to a new joining member.
/// </summary>
/// <param name="maker">checkpoint making procedure, of type ChkptMaker</param>
public void RegisterMakeChkpt(ChkptMaker maker)
{
using (var tmpLockObj = new LockAndElevate(this.theChkptMakersLock))
{
this.theChkptMakers.Add(maker);
}
}
/// <summary>
/// A process setting up a group uses RegisterMakeChkpt to register a method that will create a checkpoint
/// for storage into a file (when terminating the group) or for state-transfer to a new joining member.
/// </summary>
/// <param name="maker">checkpoint making procedure, of type ChkptMaker</param>
public void UnRegisterMakeChkpt(ChkptMaker maker)
{
using (var tmpLockObj = new LockAndElevate(this.theChkptMakersLock))
{
this.theChkptMakers.Remove(maker);
}
}
/// <summary>
/// This API is used in situations where the checkpoint for a joining member must be from a source
/// that depends on who the joiner is. It cannot be used by the application if the group is using the
/// the Vsync DHT, which has its own special choser, because only one choser function can be specified per group
/// </summary>
/// <param name="choser">Called once per joining member in all group members who were in the prior view, returns true in the single member who will
/// make the checkpoint on behalf of that joining member. Notice that if many members all join at once, different sources can send the checkpoint
/// for different receivers. We highly recommend aiming for that kind of parallelism if you will have large numbers of joiners. The DHT
/// uses this scheme because for any given joiner, only certain group members are suitable checkpoint senders</param>
/// <remarks>Using this API, a group designer can control which group member is selected to send state transfers, on a joiner by joiner basis.
/// The method will be invoked in parallel at all group members as a new view is about to be installed, and each returns true or false. The intent is
/// that just one returns true for any particular joiner; it will create a checkpoint, which will be sent to that joiner.
///
/// For example, in a group that internally replicates data in multiple patterns (perhaps, it shards a relational database, with each shard
/// on a different subset of the group members), when a new member joins a shard, the state transfer should be for that shard. In such cases
/// you can use a CheckpointChoser function to select the lowest ranked member in the same shard as the joiner. If you selected some random
/// member, or always transferred state from the rank-0 member (the leader) of the group, as in the default behavior for Vsync, the sender
/// wouldn't have that shard and hence couldn't send the needed state.
///
/// Clearly, this entire approach makes sense only if you can define a selector function that operates purely on the member's Address and the
/// membership of the group. For example, you might take the hashcode of an address and compute that value modulo some integer K to map the
/// members into the range 0..K-1. These K "subgroups" could then play the role of shard groups. Vsync can support fancier mappings that might
/// depend on other information, but you would need to be sure that all members have the identical state when the choser is invoked.
///
/// WARNING: If every member returns false, the joiner hangs waiting for state transfer, then a timeout for the join occurs, and the new member will be
/// dropped from the system (it will throw a "poison" exception). Conversely, if the member that offers to do the state transfer crashes, the join
/// will fail, throwing a "join failed" exception.
/// </remarks>
public void RegisterChkptChoser(ChkptChoser choser)
{
if (this.theChkptChoser != null && this.theChkptChoser != choser)
{
throw new VsyncException("RegisterChkptChoser: Attempt to register two checkpoint chosers for group <" + this.gname + ">");
}
this.theChkptChoser = choser;
}
internal class ChkptLdr
{
private readonly Group myGroup;
public ChkptLdr(Group g)
{
this.myGroup = g;
}
public static ChkptLdr operator +(ChkptLdr a, Delegate loader)
{
a.myGroup.RegisterLoadChkpt(loader);
return a;
}
}
internal ChkptLdr LoadChkpt;
/// <summary>
/// Register a checkpoint loading method
/// </summary>
/// <param name="loader">Checkpoint loading method</param>
/// <remarks>
/// A process setting up a group uses RegisterLoadChkpt to define a method that will load a checkpoint as part of
/// a state-transfer to initialize joining members if the group already is active when they join
/// </remarks>
public void RegisterLoadChkpt(Delegate loader)
{
this.RegisterHandler(Vsync.STATEXFER, loader);
this.flags |= G_NEEDSTATEXFER;
}
private static int tsl;
internal bool CheckPointFileExists()
{
return File.Exists(this.myCheckpointFile + ".chkpt");
}
internal void LoadCheckpointFromFile()
{
for (int retry = 0; retry < 3; retry++)
{
try
{
byte[] lb = new byte[4];
this.myChkptStream = new FileStream(this.myCheckpointFile + ".chkpt", FileMode.Open);
while (this.myChkptStream.Read(lb, 0, 4) == 4)
{
int len = (lb[3] << 24) | (lb[2] << 16) | (lb[1] << 8) | lb[0];
if (len < 0 || len > Vsync.VSYNC_MAXMSGLEN + 1024)
{
throw new VsyncException("Group <" + this.gname + ">: corrupted checkpoint file[len=" + len + " must be within 0.." + (Vsync.VSYNC_MAXMSGLEN + 1024) + "]");
}
byte[] buffer = new byte[len];
if (this.myChkptStream.Read(buffer, 0, len) != len)
{
throw new IOException("unexpected EOF");
}
if (!Msg.VerifySignature(buffer, 0, len))
{
throw new VsyncException("Group <" + this.gname + ">: corrupted checkpoint file[signature verification failed]");
}
if (this.userSpecifiedKey)
{
this.decipherBuf(buffer);
}
object[] obs = Msg.BArrayToObjects(buffer);
object[] args = new object[obs.Length + 1];
args[0] = Vsync.STATEXFER;
for (int i = 0; i < obs.Length; i++)
{
args[i + 1] = obs[i];
}
this.cbAction(null, -1, -1, Vsync.my_address, args);
}
this.myChkptStream.Close();
this.myChkptStream = null;
return;
}
catch (Exception)
{
}
}
throw new VsyncException("LoadCheckPointFromFile: I/O exception");
}
// Old style
internal void doRegister(int request, Delegate del)
{
this._RegisterHandler(request, del);
}
private void _RegisterHandler(int request, Delegate del)
{
this.Handlers[request] += new CallBack(false, del);
}
private RNGCryptoServiceProvider AesSeed;
/// <summary>
/// Places a group into secure mode, selecting a new randomly generated AES key for the initial create, and later reading the
/// key from the checkpoint .hdr file when a new member joins.
/// </summary>
/// <remarks>In this mode, Vsync stores the key in plain text form in the .hdr file, and checkpoints are also
/// unencrypted. Thus anyone with file system permissions could gain access to the group data. Use the overload
/// of SetSecure lat lets you specify a key that you manage the key outside of Vsync if you want to avoid this particular threat. </remarks>
public void SetSecure()
{
if (this.GroupOpen)
{
throw new VsyncException("Illegal to call SetSecure after Group.Join");
}
if (this.myCheckpointFile == null)
{
throw new VsyncException("Setsecure: To have Vsync pick a group key, must first call g.Persistent()");
}
this.flags |= G_SECURE;
this.InitializeMyAes();
this.myAESkey = new byte[this.myAes.KeySize >> 3];
this.AesSeed.GetBytes(this.myAESkey);
SetAesKey(this.myAes, this.myAESkey);
}
/// <summary>
/// Places a group into secure mode using a specified key that the end-user must manage externally to Vsync.
/// </summary>
/// <param name="theKey">myAesKeySize-byte AES key</param>
/// <remarks>The given key must be the same one used by other group members or some form of horrible problem will occur, probably a crash.
/// Also, keep in mind that an attacker with root privilages could still use a debugger to attach to the program and read myAESKey from memory.
/// Still, this form of security is stronger than if you let Vsync pick its own keys.</remarks>
public void SetSecure(byte[] theKey)
{
this.flags |= G_SECURE;
this.InitializeMyAes();
if ((theKey.Length << 3) != this.myAes.KeySize)
{
throw new ArgumentException("Key must be a byte[" + (this.myAes.KeySize >> 3) + "] vector");
}
this.userSpecifiedKey = true;
this.myAESkey = theKey;
if (this.myAESkey == null || (this.flags & G_SECURE) == 0)
{
Vsync.WriteLine("Group <" + this.gname + ">, set myAes=null in SetAEsKey");
this.myAes = null;
return;
}
SetAesKey(this.myAes, this.myAESkey);
}
internal void InitializeMyAes()
{
if (this.AesSeed != null || (this.flags & G_SECURE) == 0)
{
return;
}
this.AesSeed = new RNGCryptoServiceProvider();
doInitializeAes(out this.myAes);
}
internal static void doInitializeAes(out Aes theAes)
{
theAes = Aes.Create();
if (theAes == null)
{
throw new VsyncException("myAes null after Aes.Create()");
}
theAes.Padding = PaddingMode.None;
KeySizes[] ks = theAes.LegalKeySizes;
int ksbits = 0;
int bsbits = 0;
foreach (KeySizes k in ks)
{
ksbits = Math.Max(ksbits, k.MaxSize);
}
ks = theAes.LegalBlockSizes;
foreach (KeySizes k in ks)
{
bsbits = Math.Max(bsbits, k.MaxSize);
}
theAes.KeySize = ksbits;
theAes.BlockSize = bsbits;
}
internal static void SetAesKey(Aes theAes, byte[] theAesKey)
{
bool allZero = true;
int nb = theAes.KeySize >> 3;
for (int b = 0; b < nb; b++)
{
if (theAesKey[b] != 0)
{
allZero = false;
}
}
if (allZero)
{
return;
}
theAes.Key = theAesKey;
}
/// <summary>
/// Declares that this group is persistent, gives a file in which state should be stored. Vsync will append the needed
/// file name extensions, using .ckpt for the checkpoint and .hdr for a header containing various system data. A file
/// named filename.bak contains the previous checkpoint and is available to the user in case a checkpoint becomes corrupted
/// </summary>
/// <param name="filename">A file name, normally for sharing between all the group members.</param>
public void Persistent(string filename)
{
this.myCheckpointFile = filename;
}
internal bool isPersistent()
{
return this.myCheckpointFile != null;
}
internal void setupPersistentFile()
{
int retryCnt = 0;
bool tryagain = true;
while (tryagain)
{
tryagain = false;
byte[] theKey;
if (tsl == 0)
{
tsl = Msg.toBArray((long)0).Length;
}
try
{
if ((this.flags & G_SECURE) != 0)
{
this.InitializeMyAes();
theKey = new byte[this.myAes.KeySize >> 3];
}
else
{
theKey = new byte[16];
}
for (int retry = 0; retry < 15; retry++)
{
try
{
this.myChkptStream = new FileStream(this.myCheckpointFile + ".hdr", FileMode.Open);
if (this.myChkptStream.Read(theKey, 0, theKey.Length) == theKey.Length)
{
break;
}
this.myChkptStream.Close();
this.myChkptStream = null;
Vsync.Sleep(500 * ((Vsync.my_address.GetHashCode() % 10) + retry));
}
catch (FileNotFoundException)
{
// Lucky me: I get to create the persistent file
throw;
}
catch (IOException)
{
// Can happen if a race arises with many trying to join concurrently
Vsync.Sleep(500 * ((Vsync.my_address.GetHashCode() % 10) + retry));
}
}
if (this.myChkptStream == null)
{
throw new VsyncException("Unable to open checkpoint file <" + this.myCheckpointFile + ".hdr> (tried 15 times)");
}
if ((this.flags & G_SECURE) != 0 && !this.userSpecifiedKey)
{
this.myAESkey = theKey;
SetAesKey(this.myAes, this.myAESkey);
}
long ts = TypeSignature(this);
byte[] tsb = new byte[Msg.toBArray(ts).Length];
if (this.myChkptStream.Read(tsb, 0, tsl) != tsl)
{
throw new IOException("Unexpected EOF");
}
long chkts = (long)Msg.BArrayToObjects(tsb)[0];
if (ts != chkts)
{
Vsync.WriteLine("WARNING: Checkpoint for <" + this.gname + "> records different type signatures than this member (using it anyhow, but this can cause instability in your application)");
}
this.myChkptStream.Close();
this.myChkptStream = null;
}
catch (FileNotFoundException)
{
// Ignore; this arises normally during the initial group create
try
{
this.myChkptStream = new FileStream(this.myCheckpointFile + ".hdr", FileMode.Create);
}
catch
{
if (++retryCnt == 5)
{
throw new VsyncException("Unable to create or access " + this.myCheckpointFile + ".hdr");
}
Vsync.Sleep(500 * ((Vsync.my_address.GetHashCode() % 10) + retryCnt));
tryagain = true;
continue;
}
if ((this.flags & G_SECURE) != 0)
{
theKey = this.myAESkey ?? new byte[this.myAes.KeySize >> 3];
}
else
{
theKey = new byte[16];
}
byte[] kOut = new byte[theKey.Length];
if ((this.flags & G_SECURE) != 0 && !this.userSpecifiedKey)
{
Buffer.BlockCopy(theKey, 0, kOut, 0, Buffer.ByteLength(theKey));
}
this.myChkptStream.Write(kOut, 0, theKey.Length);
long ts = TypeSignature(this);
byte[] tsb = Msg.toBArray(ts);
this.myChkptStream.Write(tsb, 0, tsb.Length);
this.myChkptStream.Close();
this.myChkptStream = null;
}
catch (Exception e)
{
throw new VsyncException("Unexpected file I/O exception " + e.Message + " on checkpoint file", e);
}
}
}
internal bool makingCheckpoint = false;
/// <summary>
/// Causes the current group state to be written to the checkpoint file
/// </summary>
public void MakeCheckpoint(View v)
{
if (!this.VsyncCallStart())
{
return;
}
this.makingCheckpoint = true;
this.myChkptStream = new FileStream(this.myCheckpointFile + ".tmp", FileMode.Create);
this.inhibitEOC = true;
using (var tmpLockObj = new LockAndElevate(this.theChkptMakersLock))
{
if (this.theChkptMakers.Count > 0)
{
foreach (ChkptMaker cpm in this.theChkptMakers)
{
cpm(v);
}
}
}
this.inhibitEOC = false;
if (this.makingCheckpoint)
{
this.SendChkpt();
}
this.VsyncCallDone();
}
internal void MakeCheckpointIfLeader()
{
if (this.theView.IAmLeader())
{
this.MakeCheckpoint(this.theView);
}
if (this.CheckpointFrequency > 0)
{
Vsync.OnTimerThread(this.CheckpointFrequency, this.MakeCheckpointIfLeader);
}
}
/// <summary>
/// Completion identifier used in DiskLogger
/// </summary>
public class CompletionTag : IComparable, IComparable<CompletionTag>, IEqualityComparer, IEqualityComparer<CompletionTag>, IEquatable<CompletionTag>
{
/// <summary>
/// Sender of the SafeSend
/// </summary>
public readonly Address sender;
/// <summary>
/// ViewID in which it was sent
/// </summary>
public readonly int vid;
/// <summary>
/// Msgid of the message
/// </summary>
public readonly int msgid;
internal readonly Msg theMsg;
internal bool ordered; // True if final ordering is known
internal bool done; // True if application has done its update
/// <exclude></exclude>
public CompletionTag(Msg m)
{
if (m.sender == null || m.sender.isNull() || m.vid == Msg.UNINITIALIZED || m.msgid == Msg.UNINITIALIZED)
{
throw new VsyncException("CompletionTag: illegal msg sender/vid/msgid");
}
this.sender = m.sender;
this.vid = m.vid;
this.msgid = m.msgid;
this.theMsg = m;
}
/// <exclude></exclude>
public static bool operator <(CompletionTag first, CompletionTag second)
{
return Compare(first, second) < 0;
}
/// <exclude></exclude>
public static bool operator >(CompletionTag first, CompletionTag second)
{
return Compare(first, second) > 0;
}
/// <exclude></exclude>
public static bool operator <=(CompletionTag first, CompletionTag second)
{
return Compare(first, second) <= 0;
}
/// <exclude></exclude>
public static bool operator >=(CompletionTag first, CompletionTag second)
{
return Compare(first, second) >= 0;
}
/// <exclude></exclude>
public static bool operator ==(CompletionTag first, CompletionTag second)
{
return Compare(first, second) == 0;
}
/// <exclude></exclude>
public static bool operator !=(CompletionTag first, CompletionTag second)
{
return Compare(first, second) != 0;
}
/// <exclude></exclude>
public static int Compare(CompletionTag first, CompletionTag second)
{
if (object.ReferenceEquals(first, second))
{
return 0;
}
if (object.ReferenceEquals(first, null))
{
return -1;
}
if (object.ReferenceEquals(second, null))
{
return 1;
}
int comparison = first.sender.CompareTo(second);
if (comparison != 0)
{
return comparison;
}
comparison = first.vid.CompareTo(second.vid);
if (comparison != 0)
{
return comparison;
}
comparison = first.msgid.CompareTo(second.msgid);
return comparison;
}
/// <exclude></exclude>
public int CompareTo(object other)
{
return this.CompareTo(other as CompletionTag);
}
/// <exclude></exclude>
public int CompareTo(CompletionTag other)
{
return Compare(this, other);
}
/// <exclude></exclude>
public override bool Equals(object other)
{
return Compare(this, other as CompletionTag) == 0;
}
/// <exclude></exclude>
public bool Equals(CompletionTag other)
{
return Compare(this, other) == 0;
}
/// <exclude></exclude>
public new bool Equals(object first, object second)
{
return Compare(first as CompletionTag, second as CompletionTag) == 0;
}
/// <exclude></exclude>
public bool Equals(CompletionTag first, CompletionTag second)
{
return Compare(first, second) == 0;
}
/// <exclude></exclude>
public override int GetHashCode()
{
return this.sender.GetHashCode() + (this.vid * 100010057) + (this.msgid * 71043311);
}
/// <exclude></exclude>
public int GetHashCode(object other)
{
return other.GetHashCode();
}
/// <exclude></exclude>
public int GetHashCode(CompletionTag other)
{
return other.GetHashCode();
}
/// <exclude></exclude>
public override string ToString()
{
return "[" + this.sender + "::" + this.vid + ":" + this.msgid + "]";
}
}
/// <summary>
/// Durability Methods must implement this API
/// </summary>
public interface IDurability
{
/// <summary>
/// Logs a message, returns a new CompletionTag object for it
/// </summary>
/// <param name="m">The message to log</param>
/// <returns>The associated sender:vid:msgid</returns>
CompletionTag LogMsg(Msg m);
/// <summary>
/// Returns the CompletionTag of the most recently logged message
/// </summary>
/// <returns>The associated sender:vid:msgid</returns>
CompletionTag GetCompletionTag();
/// <summary>
/// Called when the SafeSend sets the deliver order for a pending message
/// </summary>
/// <param name="m">The associated message</param>
void SetOrder(Msg m);
/// <summary>
/// Starts an asynchronous update, later user should call Done(ct)
/// </summary>
/// <param name="ct">The associated completion tag</param>
void BeginAsyncUpdate(CompletionTag ct);
/// <summary>
/// Called by the application when done processing an update
/// </summary>
/// <param name="ct">The associated completion tag</param>
void Done(CompletionTag ct);
/// <summary>
/// Invoked by Vsync when a new view becomes defined. Will be called first, they PlayBack(), in the leader.
/// </summary>
/// <param name="v"></param>
void NewView(View v);
/// <summary>
/// On reinitialization of a group after all members fail, or when the view changes
/// and a new process assumes the role of group leader, invoked by Vsync (in the leader only) to "play back"
/// any messages that might not be stable in the group. Ideally, the new leader would
/// only need to take actions after a recovery from a full group failure, at which point
/// it should replay all messages that didn't reach the state of being "Done" at all the
/// members where they were delivered during the previous run. It isn't possible to avoid
/// duplicates so that becomes the problem of the application, which will need to filter them
/// out.
/// </summary>
void PlayBack();
/// <summary>
/// Tells the logger to shut itself down, called on group Terminate()
/// </summary>
void Shutdown();
}
internal IDurability durabilityMethod;
internal volatile bool IAmSafeSendLeader;
/// <summary>
/// Overrides the default (in-memory) durability rule for SafeSend
/// </summary>
/// <param name="theMethod">An object implementing the IDurability API</param>
public void SetDurabilityMethod(IDurability theMethod)
{
if (this.durabilityMethod != null)
{
throw new VsyncException("SetDurabilityMethod: method was already registered");
}
this.durabilityMethod = theMethod;
}
internal delegate void DLQuery(Address[] senders, int[] vids, int[] msgids);
internal delegate void DLDone(int nCleanup, Address[] senders, int[] vids, int[] msgids);
internal delegate void DLCleanup();
/// <summary>
/// This build-in class provides log-file based durability for SafeSend. Requires a prior call to g.SetSafeSendThreshold with a small integer constant
/// specifying how many of the group members must maintain logs. Throws SafeSendException if an attempt is made to issue a SafeSend in a view with fewer than
/// this number of members.
/// </summary>
public sealed class DiskLogger : IDurability, IDisposable
{
private bool disposed;
internal bool dirty;
internal LockObject theLock;
internal Group theGroup;
internal Thread theThread;
internal string theFileName = "no name";
internal FileStream theFileStream;
internal int myRank = -1;
internal Semaphore SleepOn = new Semaphore(0, int.MaxValue);
private List<CompletionTag> PendingList = new List<CompletionTag>();
private readonly List<CompletionTag> AsyncList = new List<CompletionTag>();
private readonly List<CompletionTag> DoneList = new List<CompletionTag>();
/// <exclude></exclude>
public DiskLogger(Group g, string filename)
{
if (g.safeSendThreshold <= 1)
{
throw new SafeSendException("DiskLogger requires SafeSendThreshold >= 2");
}
this.theGroup = g;
this.theFileName = filename;
this.theLock = new LockObject("DiskLogger<" + this.theGroup.gname + ">");
this.doSetup();
}
/// <summary>
/// Disposes of SleepOn semaphore
/// </summary>
public void Dispose()
{
this.Dispose(true);
}
private void Dispose(bool disposing)
{
if (disposing)
{
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
if (this.disposed)
{
return;
}
this.disposed = true;
}
this.SleepOn.Release();
this.SleepOn.Dispose();
if (this.theFileStream != null)
{
this.theFileStream.Dispose();
}
this.SleepOn = null;
}
}
/// <exclude></exclude>
public CompletionTag LogMsg(Msg m)
{
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
this.dirty = true;
return this.LogMsg(this.theFileStream, m);
}
}
// Must call with appropriate lock
private CompletionTag LogMsg(FileStream whichFileStream, Msg m)
{
if (this.myRank == -1 || this.myRank >= this.theGroup.GetSafeSendThreshold())
{
return null;
}
CompletionTag ct = new CompletionTag(m);
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
if (whichFileStream == null)
{
throw new SafeSendException("Error: attempted to do a SafeSend in a group that has fewer than SafeSendThreshold members");
}
if (whichFileStream == this.theFileStream)
{
foreach (CompletionTag oct in this.PendingList)
{
if (oct.sender == ct.sender && oct.vid == ct.vid && oct.msgid == ct.msgid)
{
return ct;
}
}
this.PendingList.Add(ct);
}
this.WriteObjects(m);
}
return ct;
}
internal void WriteObjects(params object[] obs)
{
byte[] ba = Msg.toBArray(obs);
byte[] baLen = new byte[4];
int bl = ba.Length;
baLen[0] = (byte)((bl >> 24) & 0xFF);
baLen[1] = (byte)((bl >> 16) & 0xFF);
baLen[2] = (byte)((bl >> 8) & 0xFF);
baLen[3] = (byte)(bl & 0xFF);
this.theFileStream.Seek(0, SeekOrigin.End);
this.theFileStream.Write(baLen, 0, 4);
this.theFileStream.Write(ba, 0, ba.Length);
this.theFileStream.Flush(true);
}
/// <exclude></exclude>
public CompletionTag GetCompletionTag()
{
Msg m = this.theGroup.curMsg();
if (m == null)
{
throw new VsyncException("DiskLogger: curmsg null");
}
return new CompletionTag(m);
}
/// <exclude></exclude>
/// Called when the order is determined for message m, and these calls match the sequential delivery order
public void SetOrder(Msg m)
{
if (this.myRank == -1 || this.myRank >= this.theGroup.GetSafeSendThreshold())
{
return;
}
bool fnd = false;
CompletionTag mct = m.ct;
if (mct == null)
{
throw new VsyncException("ct null in SetOrder");
}
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
List<CompletionTag> newPendingList = new List<CompletionTag>();
foreach (CompletionTag ct in this.PendingList)
{
if (ct.sender == mct.sender && ct.vid == mct.vid && ct.msgid == mct.msgid)
{
fnd = true;
if (ct.ordered)
{
return;
}
}
else if (ct.ordered)
{
newPendingList.Add(ct);
}
else if (m != null)
{
m.ct.ordered = true;
newPendingList.Add(m.ct);
m = null;
}
else
{
newPendingList.Add(ct);
}
}
if (m != null)
{
m.ct.ordered = true;
newPendingList.Add(m.ct);
}
if (!fnd)
{
throw new VsyncException("DiskLogger.SetOrder: couldn't find " + mct.sender + "::" + mct.vid + ":" + mct.msgid);
}
this.PendingList = newPendingList;
}
}
/// <exclude></exclude>
public void BeginAsyncUpdate(CompletionTag ct)
{
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
this.AsyncList.Add(ct);
}
}
/// <exclude></exclude>
public void Done(CompletionTag ct)
{
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
if (this.AsyncList.Contains(ct))
{
// In this case, expects to be called twice: once being the automated event from the callback handler, once by the asynchronous completion logic
this.AsyncList.Remove(ct);
}
else if (!this.DoneList.Contains(ct))
{
this.DoneList.Add(ct);
this.dirty = true;
if (this.theGroup.IAmSafeSendLeader && this.DoneList.Count == 1000)
{
this.SleepOn.Release();
}
}
}
}
/// <exclude></exclude>
public void Shutdown()
{
this.SleepOn.Release();
}
/// <exclude></exclude>
public void NewView(View v)
{
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
int rank = v.GetMyRank();
if (rank == -1 || rank >= this.theGroup.GetSafeSendThreshold())
{
return;
}
if (this.myRank != rank && this.theFileStream != null)
{
this.myRank = rank;
this.theFileStream.Flush(true);
this.theFileStream.Dispose();
Vsync.Sleep(50);
this.theFileStream = null;
if (rank == -1)
{
return;
}
}
else if (this.theFileStream != null)
{
return;
}
else
{
this.myRank = rank;
}
if (this.theFileStream == null)
{
for (int retry = 0; retry < 60; retry++)
{
try
{
// Open it in exclusive mode to be sure that the previous owner, if any, has finished writing to it. Could require a few tries
this.theFileStream = new FileStream(this.theFileName + "-" + this.myRank + ".dat", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
break;
}
catch (IOException e)
{
if ((VsyncSystem.Debug & VsyncSystem.DISKLOGGER) != 0)
{
Vsync.WriteLine("Attempt failed, IOException=" + e);
}
Vsync.Sleep(500);
}
}
if (this.theFileStream == null)
{
throw new VsyncException("DiskLogger: can't access " + this.theFileName + "-" + this.myRank + ".dat (tried for 30 secs)");
}
this.ReadLog();
}
if (rank == 0 && this.theThread == null)
{
this.theThread = new Thread(() =>
{
try
{
Timeout myTO = new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT, Timeout.TO_NULLREPLY);
int myThreshold = this.theGroup.GetSafeSendThreshold() - 1;
this.SleepOn.WaitOne(30 * 1000);
while (VsyncSystem.VsyncActive && (this.theGroup.GroupOpen || !this.theGroup.WasOpen))
{
Address[] senders;
int[] vids;
int[] msgids;
List<bool[]> status = new List<bool[]>();
// First, construct a list of completion tags that should be checked, in delivery order
using (var tmpLockObj1 = new LockAndElevate(this.theLock))
{
if (!this.dirty)
{
continue;
}
senders = new Address[this.DoneList.Count];
vids = new int[this.DoneList.Count];
msgids = new int[this.DoneList.Count];
int idx = 0;
foreach (CompletionTag ct in this.DoneList)
{
senders[idx] = ct.sender;
vids[idx] = ct.vid;
msgids[idx] = ct.msgid;
++idx;
}
}
// Now find out which are done everywhere; question is where is the "cutoff point", again in delivery order
if (this.theGroup.Query(myThreshold, myTO, Vsync.DISKLOGGER, senders, vids, msgids, Group.EOL, status) == myThreshold)
{
int idx = 0;
int ndx = 0;
while (idx < senders.Length)
{
bool allTrue = true;
foreach (bool[] ba in status)
{
if (!ba[idx])
{
allTrue = false;
break;
}
}
if (allTrue)
{
senders[ndx] = senders[idx];
vids[ndx] = vids[idx];
msgids[ndx] = msgids[idx];
++ndx;
}
++idx;
}
if (ndx > 0)
{
Vsync.ArrayResize(ref senders, ndx);
Vsync.ArrayResize(ref vids, ndx);
Vsync.ArrayResize(ref msgids, ndx);
using (var tmpLockObj1 = new LockAndElevate(this.theLock))
{
this.SetDoneBit(senders, vids, msgids);
}
this.WriteObjects(ndx, senders, vids, msgids);
List<string> oks = new List<string>();
// Finally, if everyone successfully logs that those are collectable, safe to collect.
if (this.theGroup.Query(myThreshold, myTO, Vsync.DISKLOGGER, ndx, senders, vids, msgids, Group.EOL, oks) == myThreshold)
{
this.theGroup.Query(myThreshold, myTO, Vsync.DISKLOGGER, Group.EOL, new List<string>());
}
}
}
this.SleepOn.WaitOne(30 * 1000);
}
}
catch (VsyncShutdownException)
{
return;
}
catch (ThreadInterruptedException)
{
return;
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "DiskLogger<" + this.theGroup.gname + ">: garbage collector", IsBackground = true };
this.theThread.Start();
}
}
}
private void doSetup()
{
// Called as the first phase of garbage collection to gather statistics on the specified completion tags
this.theGroup.doRegister(Vsync.DISKLOGGER, new DLQuery((senders, vids, msgids) =>
{
if (this.myRank <= 0 || this.myRank >= this.theGroup.GetSafeSendThreshold())
{
this.theGroup.NullReply();
return;
}
string inq = " ";
for (int i = 0; i < senders.Length; i++)
{
inq += senders[i] + "::" + vids[i] + ":" + msgids[i] + " ";
}
bool[] myStatus = new bool[senders.Length];
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
foreach (CompletionTag ct in this.PendingList)
{
for (int idx = 0; idx < senders.Length; idx++)
{
if (ct.done && senders[idx] == ct.sender && vids[idx] == ct.vid && msgids[idx] == ct.msgid)
{
myStatus[idx] = true;
break;
}
}
}
foreach (CompletionTag ct in this.DoneList)
{
for (int idx = 0; idx < senders.Length; idx++)
{
if (senders[idx] == ct.sender && vids[idx] == ct.vid && msgids[idx] == ct.msgid)
{
myStatus[idx] = true;
break;
}
}
}
}
string ms = string.Empty;
foreach (bool b in myStatus)
{
ms += b ? "1 " : "0 ";
}
if ((VsyncSystem.Debug & VsyncSystem.DISKLOGGER) != 0)
{
Vsync.WriteLine("Disklogger: Responding to an inquiry {" + inq + "}, myStatus={" + ms + "}");
}
this.theGroup.doReply(myStatus);
}));
// Second phase notes that these message are now collectable but doesn't collect them yet
this.theGroup.doRegister(Vsync.DISKLOGGER, new DLDone((howMany, senders, vids, msgids) =>
{
if (this.myRank <= 0 || this.myRank >= this.theGroup.GetSafeSendThreshold())
{
this.theGroup.NullReply();
return;
}
string inq = " ";
for (int i = 0; i < senders.Length; i++)
{
inq += senders[i] + "::" + vids[i] + ":" + msgids[i] + " ";
}
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
this.SetDoneBit(senders, vids, msgids);
this.WriteObjects(howMany, senders, vids, msgids);
}
if ((VsyncSystem.Debug & VsyncSystem.DISKLOGGER) != 0)
{
Vsync.WriteLine("Disklogger: Responding to an prepare message {" + inq + "}");
}
this.theGroup.doReply("OK");
}));
// Now can cleanup the log, if you wish
this.theGroup.doRegister(Vsync.DISKLOGGER, new DLCleanup(() =>
{
if (this.myRank == -1 || this.myRank >= this.theGroup.GetSafeSendThreshold())
{
return;
}
if ((VsyncSystem.Debug & VsyncSystem.DISKLOGGER) != 0)
{
Vsync.WriteLine("Disklogger: Received a commit message");
}
this.RewriteLog(false);
this.theGroup.doReply("OK");
}));
}
internal void RewriteLog(bool verbose)
{
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
try
{
int nc = 0;
if (verbose)
{
Vsync.WriteLine("... creating temp file <" + this.theFileName + "-" + this.myRank + ".tmp>");
}
using (FileStream newFileStream = new FileStream(this.theFileName + "-" + this.myRank + ".tmp", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
List<CompletionTag> newPendingList = new List<CompletionTag>();
foreach (CompletionTag ct in this.PendingList)
{
if (!ct.done)
{
newPendingList.Add(ct);
this.LogMsg(newFileStream, ct.theMsg);
}
else
{
++nc;
}
}
if (verbose)
{
Vsync.WriteLine("... Wrote " + (this.PendingList.Count - nc) + " SafeSend messages to new log file");
}
this.PendingList = newPendingList;
this.theFileStream.Flush();
this.theFileStream.Dispose();
newFileStream.Flush();
}
Vsync.Sleep(250);
if (verbose)
{
Vsync.WriteLine("... Renaming file");
}
File.Replace(this.theFileName + "-" + this.myRank + ".tmp", this.theFileName + "-" + this.myRank + ".dat", this.theFileName + "-" + this.myRank + ".bak");
Vsync.Sleep(250);
this.theFileStream = new FileStream(this.theFileName + "-" + this.myRank + ".dat", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
if ((VsyncSystem.Debug & VsyncSystem.DISKLOGGER) != 0)
{
Vsync.WriteLine("Disklogger: Successful garbage collection of " + nc + " SafeSend messages");
}
if (verbose)
{
Vsync.WriteLine("... DiskLogger logfile repair successful");
}
}
catch (IOException e)
{
throw new VsyncException("DiskLogger: Unable to handle I/O exception " + e.Message, e);
}
}
}
private int SetDoneBit(Address[] senders, int[] vids, int[] msgids)
{
int fnd = 0;
foreach (CompletionTag ct in this.PendingList)
{
for (int idx = 0; idx < senders.Length; idx++)
{
if (senders[idx] == ct.sender && vids[idx] == ct.vid && msgids[idx] == ct.msgid)
{
ct.done = true;
++fnd;
break;
}
}
}
return fnd;
}
/// <exclude></exclude>
public void PlayBack()
{
List<Msg> toResend = new List<Msg>();
if ((VsyncSystem.Debug & VsyncSystem.DISKLOGGER) != 0)
{
Vsync.WriteLine("** Disklogger: entering PlayBack");
}
using (var tmpLockObj = new LockAndElevate(this.theLock))
{
if (this.theFileStream == null)
{
return;
}
View theView;
using (var tmpLockObj1 = new LockAndElevate(this.theGroup.ViewLock))
{
theView = this.theGroup.theView;
}
if (theView.members.Length < this.theGroup.GetSafeSendThreshold())
{
return;
}
this.theGroup.IAmSafeSendLeader = true;
List<CompletionTag> newPendingList = new List<CompletionTag>();
foreach (CompletionTag ct in this.PendingList)
{
if (!ct.done)
{
toResend.Add(ct.theMsg);
}
else
{
newPendingList.Add(ct);
}
}
this.PendingList = newPendingList;
}
if (toResend.Count > 0)
{
new Thread(() =>
{
try
{
foreach (Msg m in toResend)
{
if (VsyncSystem.VsyncActive)
{
this.theGroup.SafeSend(Msg.BArrayToObjects(m.payload));
}
}
}
catch (VsyncShutdownException)
{
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "SafeSend:Playback<" + this.theGroup.gname + ">", IsBackground = true }.Start();
}
}
// Call holds theLock
internal void ReadLog()
{
byte[] ba = new byte[4];
bool corrupt = false;
this.theFileStream.Seek(0, SeekOrigin.Begin);
while (this.theFileStream.Read(ba, 0, 4) == 4)
{
int len = (ba[0] << 24) | (ba[1] << 16) | (ba[2] << 8) | ba[3];
ba = new byte[Math.Abs(len)];
int nb;
if ((nb = this.theFileStream.Read(ba, 0, len)) == len)
{
object[] obs = Msg.BArrayToObjects(ba);
if (Msg.CheckTypes(obs, typeof(Msg)))
{
CompletionTag ct = new CompletionTag((Msg)Msg.BArrayToObjects(ba)[0]);
bool fnd = false;
foreach (CompletionTag oct in this.PendingList)
{
if (oct.sender == ct.sender && oct.vid == ct.vid && oct.msgid == ct.msgid)
{
fnd = true;
break;
}
}
if (!fnd)
{
this.PendingList.Add(ct);
}
}
else if (Msg.CheckTypes(obs, typeof(int), typeof(Address[]), typeof(int[]), typeof(int[])))
{
int x = 1; // x = 0; int howMany = (int)obs[x++];
Address[] senders = (Address[])obs[x++];
int[] vids = (int[])obs[x++];
int[] msgids = (int[])obs[x];
foreach (CompletionTag ct in this.PendingList)
{
for (int idx = 0; idx < senders.Length; idx++)
{
if (senders[idx] == ct.sender && vids[idx] == ct.vid && msgids[idx] == ct.msgid)
{
ct.done = true;
break;
}
}
}
}
else
{
corrupt = true;
Vsync.WriteLine("Warning: DiskLogger unable to interpret data read from a log file... truncating and initiating autorepair");
break;
}
}
else
{
corrupt = true;
Vsync.WriteLine("Warning: DiskLogger expected " + len + " bytes but EOF after " + nb + " bytes.... truncating and initiating autorepair");
break;
}
}
if (corrupt)
{
Vsync.WriteLine("WARNING: Vsync detected a damaged or partially written DiskLogger log file... auto-repairing");
this.RewriteLog(true);
}
int nd = 0;
foreach (CompletionTag ct in this.PendingList)
{
if (ct.done)
{
++nd;
}
}
if ((VsyncSystem.Debug & VsyncSystem.DISKLOGGER) != 0)
{
Vsync.WriteLine("Disklogger ReadLog finished. Read " + this.PendingList.Count + " messages, of which " + nd + " were flagged as completed");
}
}
}
/// <summary>
/// Requests that the state of the group be checkpointed automatically at the specified interval
/// </summary>
/// <param name="interval">inter-checkpoint interval in seconds</param>
public void SetCheckpointFrequency(int interval)
{
this.CheckpointFrequency = Math.Max(-1, Math.Min(interval, int.MaxValue / 1000) * 1000);
if (this.CheckpointFrequency > 0)
{
Vsync.OnTimerThread(this.CheckpointFrequency, this.MakeCheckpointIfLeader);
}
}
/// <summary>Register view callback handler. </summary>
/// <param name="cbproc">Handler for view callbacks</param>
/// <remarks>
/// A process setting up a group calls RegisterViewCB to register a callback handler that will be invoked when a new View is received.
/// Read about the <it>virtual synchrony model</it> to learn more about the View property of groups, which is a key feature of the Vsync system
/// and will be very useful to you in designing applications that make full use of Vsync.
/// </remarks>
public void RegisterViewHandler(ViewHandler cbproc)
{
VHCallBack vcb = new VHCallBack(false, cbproc);
using (var tmpLockObj = new LockAndElevate(this.ViewHandlers.vhListLock))
{
this.ViewHandlers.vhList.Add(vcb);
}
}
internal class Initer
{
private readonly Group myGroup;
public Initer(Group g)
{
this.myGroup = g;
}
public static Initer operator +(Initer a, Initializer maker)
{
a.myGroup.RegisterInitializer(maker);
return a;
}
}
internal Initer Initializer;
/// <summary>
/// A process setting up a group calls RegisterInitializer to declare the method that will initialize the group if this member turns out to create it
/// </summary>
/// <param name="initproc"></param>
public void RegisterInitializer(Initializer initproc)
{
this.theInitializer = initproc;
}
internal void doRegisterViewCB(VHCallBack vcb)
{
using (var tmpLockObj = new LockAndElevate(this.ViewHandlers.vhListLock))
{
this.ViewHandlers.vhList.Add(vcb);
}
}
private void TypeCheck(object[] obs)
{
if (obs == null || obs.Length < 1)
{
throw new VsyncException("Wrong number of arguments");
}
if (obs[0].GetType() == typeof(Msg))
{
return;
}
if (obs[0].GetType() != typeof(int))
{
throw new VsyncException("First argument should be a request type");
}
int request = (int)obs[0];
if (this.Handlers[request] == null)
{
return;
}
foreach (CallBack cb in this.Handlers[request].hList)
{
if (TypeMatch(obs, cb))
{
return;
}
}
string ts = string.Empty;
foreach (object o in obs)
{
ts += o.GetType() + "...";
}
throw new VsyncException("No callback for request " + Vsync.rToString((int)obs[0]) + " matches the provided type signature: " + ts);
}
internal int rcode(object[] obs)
{
object rc = obs[0];
Type t = rc.GetType();
if (t == typeof(int))
{
return (int)rc;
}
if (t == typeof(byte))
{
return (byte)rc;
}
throw new VsyncException("Request code was of type " + rc.GetType() + " but expected byte or int!");
}
private static bool TypeMatch(object[] obs, CallBack cb)
{
if (cb.cbProc.ptypes.Length != obs.Length - 1)
{
return false;
}
for (int i = 0; i < cb.cbProc.ptypes.Length; i++)
{
if (obs[i + 1] == null || (obs[i + 1].GetType() != cb.cbProc.ptypes[i] && cb.cbProc.ptypes[i] != typeof(object)))
{
return false;
}
}
return true;
}
private static bool DHTTypeMatch(object[] obs, CallBack cb)
{
if (cb.cbProc.ptypes.Length != obs.Length)
{
return false;
}
for (int i = 0; i < obs.Length; i++)
{
if (obs[i] == null || (obs[i].GetType() != cb.cbProc.ptypes[i] && cb.cbProc.ptypes[i] != typeof(object)))
{
return false;
}
}
return true;
}
// Returns a 64-bit value computed from an SHA2/256 hash of the type signature of the group
// SHA2/256 is actually a 256-bit result, so we just loop over it and XOR the parts
internal static long TypeSignature(Group g)
{
if (g.TypeSig != 0)
{
return g.TypeSig;
}
List<string> sigs = new List<string>();
int ridx = 0;
foreach (myHandlers mh in g.Handlers.ListofhLists)
{
if (mh != null)
{
foreach (CallBack cb in mh.hList)
{
string s = "[" + ridx++ + "]";
foreach (Type pt in cb.cbProc.ptypes)
{
s += pt + ":";
}
sigs.Add(s);
}
}
}
tokenInfo theToken;
using (var tmpLockObj = new LockAndElevate(g.TokenLock))
{
theToken = g.theToken;
}
if (theToken != null)
{
foreach (AggInfo ag in g.AggTypes)
{
sigs.Add(ag.KVT);
}
}
sigs.Sort();
string sig = string.Empty;
foreach (string s in sigs)
{
sig = sig + s + ";";
}
if ((g.flags & G_SECURE) != 0)
{
sig += "(secure)";
}
g.TypeSigStr = sig;
if ((VsyncSystem.Debug & VsyncSystem.TYPESIGS) != 0)
{
Vsync.WriteLine("Computing type signature for " + g.gname + ", long form is " + sig);
}
using (MemoryStream ms = new MemoryStream(Msg.StringToBytes(sig)))
using (HMAC hm = new HMACSHA256(new byte[] { 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11, 56, 78, 9, 23, 10, 87, 33, 11 }))
using (var tmpLockObj = new LockAndElevate(Msg.VerifyLock))
{
byte[] ba = hm.ComputeHash(ms);
long rval = 0;
for (int i = 0; i < ba.Length; i++)
{
rval ^= (((long)ba[i]) & 0xFF) << ((i & 3) << 3);
}
if ((VsyncSystem.Debug & VsyncSystem.TYPESIGS) != 0)
{
Vsync.WriteLine("... returning compressed version " + rval);
}
if (rval == 0)
{
rval = 0x1010101001010101;
}
return rval;
}
}
internal const int CREATE = 0x0001;
internal const int JOIN = 0x0002;
internal const int CANBEORACLE = 0x0004;
/// <summary>
/// A process calls Join to join or create a group; Vsync will figure out what to do. Set up the group in advance by registering data types, handlers, aggregators.
/// </summary>
public void Join()
{
this.Join(0L);
}
/// <summary>
/// Join a group, specifying an offset into the group state
/// </summary>
/// <param name="off">offset into the group's state</param>
public void Join(long off)
{
if (this.GroupOpen)
{
throw new VsyncException("Group Join/Create but the group was already active");
}
this.ThrashingCheck();
using (var tmpLockObj = new LockAndElevate(this.UniversalP2PHandlers.uhListLock))
using (var tmpLockObj1 = new LockAndElevate(this.UniversalMHandlers.uhListLock))
{
if (this.UniversalP2PHandlers.uhList.Count > 0 || this.UniversalMHandlers.uhList.Count > 0)
{
this.hasUniversalHandlers = true;
}
}
if (this.myCheckpointFile != null)
{
this.setupPersistentFile();
}
Group[] groups = { this };
if (this != Vsync.ORACLE && this != Vsync.VSYNCMEMBERS)
{
this.Handlers.locked = this.UniversalMHandlers.locked = this.UniversalP2PHandlers.locked = true;
}
doJoin(CREATE | JOIN, 0, groups, off);
JoinWait(groups);
}
/// <summary>
/// A process calls JoinExisting to join a group that must be preexisting. Set up the group in advance by registering data types, handlers, aggregators.
/// </summary>
public void JoinExisting()
{
this.JoinExisting(0L);
}
/// <summary>
/// Join an existing group, specifying an offset into the group state
/// </summary>
/// <param name="off">offset into the group's state</param>
public void JoinExisting(long off)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
this.ThrashingCheck();
Group[] groups = { this };
doJoin(JOIN, 0, groups, off);
JoinWait(groups);
}
/// <summary>
/// A process calls Create to create a new group. Set up the group in advance by registering data types, handlers, aggregators.
/// </summary>
public void Create()
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
Group[] groups = { this };
doJoin(CREATE | JOIN, 0, groups, 0L);
JoinWait(groups);
}
private void ThrashingCheck()
{
using (var tmpLockObj = new LockAndElevate(this.RecentlyLeftLock))
{
if (RecentlyLeft.Contains(this.gaddr))
{
// This is to prevent developers from building applications that thrash, putting Vsync under heavy stress
throw new VsyncException("Vsync join error: Illegal to rejoin an existing group within 5 minutes after leaving it.");
}
}
}
/// <summary>
/// A process calls multiCreate to create a set of groups as a single atomic action.
/// </summary>
/// <param name="groups">List of groups to create</param>
/// <remarks>Set up the groups in advance by registering data types, handlers, aggregators.</remarks>
public static void multiCreate(Group[] groups)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
doJoin(CREATE | JOIN, 0, groups, 0L);
JoinWait(groups);
}
/// <summary>
/// A master process calls this overload of multiCreate to create a set of groups containing a set of members as a single atomic action.
/// </summary>
/// <param name="workers">List of processes to initially include in the groups</param>
/// <param name="groups">List of groups</param>
/// <remarks>Set up the groups in advance by registering data types, handlers, aggregators.</remarks>
public static void multiCreate(Address[] workers, Group[] groups)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
doJoin(CREATE | JOIN, 0, workers, groups, 0L);
}
/// <summary>
/// A master process calls this overload of multiJoin to add a set of workers to a list of groups as a single atomic action.
/// </summary>
/// <param name="workers">List of processes to add to the groups</param>
/// <param name="groups">List of groups</param>
/// <remarks>Set up the groups in advance by registering data types, handlers, aggregators.</remarks>
public static void multiJoin(Address[] workers, Group[] groups)
{
multiJoin(workers, groups, 0L);
}
/// <summary>
/// Joint a list of groups, specifying an offset into the group state
/// </summary>
/// <param name="workers">List of processes to add to the groups</param>
/// <param name="groups">List of groups</param>
/// <param name="offset">Offset into the group state</param>
public static void multiJoin(Address[] workers, Group[] groups, long offset)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
foreach (Group g in groups)
{
if (!g.HasFirstView)
{
throw new VsyncException("multiJoin: Can only be called by a current member of <" + g.gname + ">");
}
}
doJoin(CREATE | JOIN, 0, workers, groups, offset);
}
/// <summary>
/// A process calls multiJoin to join a set of group as a single atomic action.
/// </summary>
/// <param name="groups">List of groups to join</param>
/// <remarks>Set up the groups in advance by registering data types, handlers, aggregators.</remarks>
public static void multiJoin(Group[] groups)
{
multiJoin(groups, 0L);
}
/// <summary>
/// Join a set of groups as a single atomic action
/// </summary>
/// <param name="groups">List of groups to join</param>
/// <param name="offset">Offset into the group state</param>
public static void multiJoin(Group[] groups, long offset)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
doJoin(CREATE | JOIN, 0, groups, offset);
JoinWait(groups);
}
/// <summary>
/// Manager for a group calls Terminate to garbage collect the group, leaving members operational.
/// </summary>
/// <remarks>Once this call is issued, if any member tries to issue Vsync system calls on the group, those calls will
/// fail and a warning will print to the console. So don't call Terminate while your group is still active. </remarks>
public void Terminate()
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
if (!this.GroupOpen || (this.flags & G_TERMINATING) != 0)
{
return;
}
doTerminate(new[] { this });
}
/// <summary>
/// Manager for a set of groups calls <c>multiTerminate</c> to garbage collect the groups as a single action, leaving members operational.
/// </summary>
/// <param name="groups">List of the groups to terminate</param>
public static void multiTerminate(Group[] groups)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
foreach (Group g in groups)
{
if (!g.HasFirstView)
{
throw new VsyncException("multiTerminate: Caller must Join <" + g.gname + "> before calling multiTerminate");
}
}
doTerminate(groups);
}
/// <summary>
/// A process calls HasFailed to report a failure
/// </summary>
/// <param name="who">Address of the failed process</param>
/// <remarks>
/// The caller and failed process must both belong to the current view. The target will be forced out of the
/// Vsync system (a poison pill will be sent to it just in case it is still running) and it will drop out of
/// all groups to which it belongs.
/// </remarks>
public void HasFailed(Address who)
{
if (this.theView.GetRankOf(who) == -1)
{
return;
}
Vsync.NodeHasFailed(who, "(From app-level HasFailed)", false);
}
internal Semaphore xferWait = new Semaphore(0, int.MaxValue);
private static void JoinWait(Group[] groups)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
foreach (Group g in groups)
{
if (!g.HasFirstView)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("Waiting on LLINITV barrier lock for gaddr " + g.gaddr + "(Lock [" + ILock.LLINITV + "][" + ILock.GetLockId(ILock.LLINITV, g.gaddr.GetHashCode()) + "])");
}
if (!g.HasFirstView)
{
ILock.Barrier(ILock.LLINITV, g.gaddr).BarrierWait();
}
if ((VsyncSystem.Debug & (VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("After LLINITV barrier lock for gaddr " + g.gaddr + "(Lock [" + ILock.LLINITV + "][" + ILock.GetLockId(ILock.LLINITV, g.gaddr.GetHashCode()) + "])");
}
if (g.joinFailed)
{
// Join but it didn't exist, or Create but it did exist
throw new VsyncException(g.reason);
}
if (!g.HasFirstView)
{
throw new VsyncException("JoinWait barrier returned but group has no view!!! " + g.gaddr);
}
if (g.theView.GetMyRank() == -1)
{
throw new VsyncException("JoinWait barrier returned but I'm not in " + Address.VectorToString(g.theView.members) + " (gaddr " + g.gaddr + ")");
}
}
if ((g.flags & G_NEEDSTATEXFER) != 0)
{
ILock.NoteThreadState("xferWait.WaitOne()");
g.xferWait.WaitOne();
ILock.NoteThreadState(null);
g.xferWait.Release();
}
int lcnt = 0;
while (!g.CallbacksDone)
{
if (lcnt++ > 250)
{
throw new VsyncException("Stuck waiting for NEWVIEW callbacks to occur in <" + g.gname + ">.Join()");
}
Vsync.Sleep(100);
}
}
}
internal class vGroup
{
internal Address creator;
internal Address vGroupAddr;
internal Address[] vGMembers;
internal vGroup(Address who, Address[] members)
{
this.creator = Vsync.my_address;
this.vGroupAddr = who;
this.vGMembers = members;
}
internal vGroup(Address createdBy, Address who, Address[] members)
{
this.creator = createdBy;
this.vGroupAddr = who;
this.vGMembers = members;
}
}
internal static List<vGroup> vGList = new List<vGroup>();
internal static LockObject vGLock = new LockObject("vGLock");
internal static Random rand = new Random();
internal static Address virtualGroup(Address[] members)
{
using (var tmpLockObj = new LockAndElevate(vGLock))
{
foreach (vGroup vg in vGList)
{
if (vGMatch(vg.vGMembers, members))
{
return vg.vGroupAddr;
}
}
}
vGroup nvg = new vGroup(newVGAddress(members), members);
using (var tmpLockObj = new LockAndElevate(vGLock))
{
vGList.Add(nvg);
}
return nvg.vGroupAddr;
}
internal static bool vGMatch(Address[] l0, Address[] l1)
{
if (l0.Length != l1.Length)
{
return false;
}
for (int i = 0; i < l0.Length; i++)
{
if (l0[i] != l1[i])
{
return false;
}
}
return true;
}
internal static Address newVGAddress(Address[] members)
{
// Creates a placeholder Address() with a negative pid (used to notice that this is a placeholder), picked
// to reduce risk of collisions if we test Vsync with multiple "masters" on one machine. Otherwise the home
// IP address should suffice to avoid problems.
Address nVGA = new Address(Vsync.my_address.home, -rand.Next(1, int.MaxValue));
if (Vsync.ClientOf != null)
{
Vsync.ORACLE.P2PSend(Vsync.ClientOf, Vsync.RELAYREGISTERVG, Vsync.my_address, nVGA, members);
}
else
{
Vsync.ORACLE.doSend(false, false, Vsync.REGISTERVG, Vsync.my_address, nVGA, members);
}
return nVGA;
}
internal static void noteVGMap(Address master, Address nVGA, Address[] members)
{
if (master.isMyAddress() || Vsync.VSYNCMEMBERS.theView.GetRankOf(master) == -1)
{
return;
}
vGroup nvg = new vGroup(newVGAddress(members), members);
using (var tmpLockObj = new LockAndElevate(vGLock))
{
foreach (vGroup vg in vGList)
{
if (vg.vGroupAddr == nVGA)
{
throw new VsyncException("Reuse of nVGA " + nVGA);
}
}
vGList.Add(nvg);
}
Vsync.VSYNCMEMBERS.Watch[master] += ev =>
{
using (var tmpLockObj = new LockAndElevate(vGLock))
{
vGList.Remove(nvg);
}
};
}
internal static vGroup vGLookup(Address vga)
{
using (var tmpLockObj = new LockAndElevate(vGLock))
{
foreach (vGroup vg in vGList)
{
if (vg.vGroupAddr == vga)
{
return vg;
}
}
}
return null;
}
internal static string vgGetState()
{
string s = "MEMBER-SET SHORTCUTS: ";
using (var tmpLockObj = new LockAndElevate(vGLock))
{
if (vGList.Count == 0)
{
return string.Empty;
}
foreach (vGroup vg in vGList)
{
s += Environment.NewLine + " Creator: " + vg.creator + ", members={" + Address.VectorToString(vg.vGMembers) + "}";
}
}
return s;
}
internal static void doJoin(int mode, int level, Address[] newMembers, Group[] groups, long offset)
{
doJoin(mode, level, virtualGroup(newMembers), groups, offset);
}
internal static void doJoin(int mode, int level, Group[] groups, long offset)
{
doJoin(mode, level, Vsync.my_address, groups, offset);
}
internal static void doJoin(int mode, int level, Address who, Group[] groups, long offset)
{
int groupsToJoin = 0;
groups = groups.AsEnumerable().Where(g => ((!g.GroupOpen) || (!who.isMyAddress()))).ToArray();
foreach (Group g in groups)
{
g.TypeSig = TypeSignature(g);
groupsToJoin++;
g.GroupOpen = true;
}
if (groupsToJoin == 0)
{
return;
}
Address[] gaddrs = new Address[groups.Length];
string[] gnames = new string[groups.Length];
int[] flags = new int[groups.Length];
long[] tsigs = new long[groups.Length];
for (int idx = 0; idx < groups.Length; idx++)
{
gnames[idx] = groups[idx].gname;
gaddrs[idx] = groups[idx].gaddr;
flags[idx] = groups[idx].flags;
tsigs[idx] = groups[idx].TypeSig;
if (gnames[idx] == null || gaddrs[idx] == null)
{
throw new VsyncException("Something is wrong in doJoin");
}
}
if (Vsync.ClientOf != null)
{
if ((VsyncSystem.Debug & VsyncSystem.RELAYLOGIC) != 0)
{
Vsync.WriteLine("Invoking Vsync.RELAYJOIN: sender " + Vsync.my_address + ", client of " + Vsync.ClientOf + ", vectors of length " + gnames.Length);
for (int n = 0; n < gnames.Length; n++)
{
Vsync.WriteLine(" gname " + gnames[n] + ", gaddrs " + gaddrs[n] + ", gsigs " + tsigs[n]);
}
}
bool ISend = false;
while (Vsync.ClientOf != null && Vsync.ORACLE.doP2PQuery(Vsync.ClientOf, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT, Timeout.TO_FAILURE, "RELAYJOIN"), Vsync.RELAYJOIN, who, Vsync.ORACLE.uids++, mode, gnames, gaddrs, offset, tsigs, flags).Length == 0)
{
if (Vsync.newClientOfCnt++ == 0)
{
ISend = true;
}
Vsync.ClientOf = null;
if (ISend)
{
Vsync.ORACLE.doSend(false, false, Vsync.JOIN, Vsync.my_address, mode, new[] { "ORACLE" }, new[] { Vsync.ORACLE.gaddr }, offset, new[] { 0L }, new[] { 0 }, ++VsyncSystem.VsyncJoinCounter);
}
ILock.Barrier(ILock.LLWAIT, ILock.LCLIENTOF).BarrierWait(Vsync.ORACLE);
}
}
else
{
if ((VsyncSystem.Debug & VsyncSystem.GVELOGIC) != 0)
{
string gns = string.Empty, isls = " ";
foreach (string gs in gnames)
{
gns += " " + gs;
}
foreach (int f in flags)
{
isls += f + " ";
}
Vsync.WriteLine("Sending Vsync.JOIN requests to the ORACLE for gnames [" + gns + " ], gaddrs [" + Address.VectorToString(gaddrs) + "], isLarge{" + isls + "}");
}
Vsync.ORACLE.doSend(false, false, Vsync.JOIN, who, mode, gnames, gaddrs, offset, tsigs, flags, ++VsyncSystem.VsyncJoinCounter);
}
VsyncSystem.VsyncRestarting = false;
}
internal static void doTerminate(Group[] groups)
{
Address[] gaddrs = new Address[groups.Length];
int idx = 0;
foreach (Group g in groups)
{
if (g.GroupOpen)
{
g.Query(ALL, new Timeout(15000, Timeout.TO_FAILURE), Vsync.TERMINATE, g.gaddr, EOL);
g.Flush();
gaddrs[idx++] = g.gaddr;
}
}
while (Vsync.ClientOf != null)
{
if (Vsync.ORACLE.doP2PQuery(Vsync.ClientOf, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT, Timeout.TO_FAILURE, "RELAYTERM"), Vsync.RELAYTERM, Vsync.my_address, Vsync.ORACLE.uids++, gaddrs).Length != 0)
{
break;
}
}
if (Vsync.ClientOf == null)
{
Vsync.ORACLE.doSend(false, false, Vsync.TERMINATE, Vsync.my_address, Vsync.ORACLE.uids++, gaddrs);
}
TerminationWait(groups);
}
internal const int NPING = 3;
// Every Vsync member pings randomly selected members of groups to which it belongs at a rate of
// NPINGs per second. For example, if NPING=3, each member picks three other processes (if it can find that many)
// and just nudges them, once per second. This triggers various failure detection logic so that if a process
// fails, or even an entire group fails, we'll figure it out quickly and clean up the mess
internal static void GroupMemberHeartBeat()
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
try
{
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[9]++;
Random rand = new Random();
List<Address> alist = new List<Address>();
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
int N = 0;
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
if (kvp.Value.HasFirstView)
{
N += kvp.Value.theView.members.Length;
}
}
if (N > 1)
{
for (int k = 0; alist.Count < NPING && k < N; k++)
{
int which = rand.Next(N);
int cnt = 0;
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
Group g = kvp.Value;
if (g.HasFirstView)
{
int off = which - cnt;
if (off >= 0 && off < g.theView.members.Length)
{
if (!g.theView.members[off].isMyAddress() && !alist.Contains(g.theView.members[off]))
{
alist.Add(g.theView.members[off]);
}
}
cnt += g.theView.members.Length;
}
}
}
}
}
foreach (Address a in alist)
{
SendPing(a);
}
Vsync.Sleep(30000);
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}
internal static MCMDSocket.GRPair[] GroupRates()
{
List<MCMDSocket.GRPair> GRPList = new List<MCMDSocket.GRPair>();
using (var tmpLockObj = new LockAndElevate(VsyncGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in VsyncGroups)
{
Group g = kvp.Value;
if ((VsyncSystem.Debug & VsyncSystem.MCMD) != 0)
{
Vsync.WriteLine("MCMD: Updating rate info for gaddr=" + g.gaddr + ", g.rcvdMcastsRate = (rate:" + g.rcvdMcastsRate + " cnt:" + g.rcvdMcastsCnt + ")/2 giving " + ((g.rcvdMcastsRate + g.rcvdMcastsCnt) / 2));
}
g.rcvdMcastsRate = (g.rcvdMcastsRate + g.rcvdMcastsCnt) / 2;
g.rcvdMcastsCnt = 0;
GRPList.Add(new MCMDSocket.GRPair(g.gaddr, g.rcvdMcastsRate));
}
}
return GRPList.ToArray();
}
internal void OracleHeartBeat()
{
try
{
int n = 1;
int looped = 0;
int counter = 0;
List<Address> CleanUp = new List<Address>();
while (VsyncSystem.VsyncActive && (this.GroupOpen || !this.WasOpen))
{
VsyncSystem.RTS.ThreadCntrs[10]++;
Vsync.Sleep(1000);
View theView;
Address sendTo = null;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (!this.HasFirstView)
{
if (looped++ == 5)
{
if ((VsyncSystem.Debug & VsyncSystem.STARTSEQ) != 0)
{
Vsync.WriteLine("VSYNC ORACLE SERVICE: Restarted");
}
VsyncSystem.VsyncRestarting = false;
View v = new View("ORACLE", Vsync.ORACLE.gaddr, new[] { Vsync.my_address }, 0, false);
Vsync.ORACLE.NewView(v, "ORACLE", null);
ReliableSender.StartGroupReader(Vsync.ORACLE);
// Runs only in the current leader
Vsync.ORACLE.LeaderMode = true;
Vsync.ORACLE.GroupOpen = true;
Vsync.OracleViewThread = new Thread(Vsync.OracleViewTask) { Name = "Vsync <ORACLE> View Thread", IsBackground = true };
Vsync.IAmOracle = true;
Vsync.OracleViewThread.Start();
MCMDSocket.RunMappingTask();
}
}
else if (theView != null)
{
if (theView.IAmLeader())
{
n = (n + 1) % theView.members.Length;
if (!theView.members[n].isMyAddress() && !theView.hasFailed[n])
{
SendPing(theView.members[n]);
}
}
else if ((n = theView.GetMyRank()) > 0)
{
while (theView.hasFailed[--n])
{
if (n == 0)
{
throw new VsyncException("Total failure of the ORACLE group (of which I'm a client): Vsync must shut down.");
}
}
SendPing(theView.members[n]);
}
if ((counter % 1) == 0)
{
// Every second, the Oracle members....
using (var tmpLockObj = new LockAndElevate(TPGroupsLock))
{
foreach (KeyValuePair<Address, Group> kvp in TPGroups)
{
Group g = kvp.Value;
int r = theView.GetMyRank();
if (this.HasFirstView && r != -1 && (((g.gaddr.GetHashCode() + counter) % theView.members.Length) == r))
{
if (g.theView != null && g.theView.members.Length > 0)
{
for (int i = 0; i < g.theView.members.Length; i++)
{
if (!g.theView.hasFailed[i])
{
// ... take turns pinging the leaders of each group (the first non-failed member in the view)
// The idea is to spread load but also to be sure that every group definitely does get pinged once per second
sendTo = g.theView.members[i];
break;
}
}
}
}
}
}
}
if (sendTo != null)
{
SendPing(sendTo);
}
}
if (++counter % 600 == 0)
{
/* Every five minutes */
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
foreach (Address a in CleanUp)
{
Vsync.RIPList.Remove(a);
}
CleanUp = new List<Address>();
foreach (Address a in Vsync.RIPList)
{
CleanUp.Add(a);
}
}
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}
// Ping him, unless something is already in the send queue or the send queue length is excessive
private static void SendPing(Address sendTo)
{
if (sendTo.isMyAddress())
{
return;
}
bool fnd = false;
if ((VsyncSystem.Debug & VsyncSystem.LOWLEVELMSGS) != 0)
{
using (var tmpLockObj = new LockAndElevate(ReliableSender.ackInfoLock))
{
ReliableSender.ackInfo.Add("[" + Vsync.MsToSecs(Vsync.NOW) + "]: SendPing to " + sendTo + Environment.NewLine);
}
}
using (var tmpLockObj = new LockAndElevate(ReliableSender.PendingSendBufferLock))
{
if (ReliableSender.PendingSendBuffer.Count + ReliableSender.P2PPendingSendBuffer.Count > Vsync.VSYNC_ASYNCMTOTALLIMIT * 2)
{
fnd = true;
}
else
{
foreach (ReliableSender.MsgDesc md in ReliableSender.PendingSendBuffer)
{
if (md.dest == sendTo)
{
fnd = true;
break;
}
}
if (!fnd)
{
foreach (ReliableSender.MsgDesc md in ReliableSender.P2PPendingSendBuffer)
{
if (md.dest == sendTo)
{
fnd = true;
break;
}
}
}
}
}
if (!fnd)
{
ReliableSender.SendP2P(Msg.ISPING, sendTo, null, new byte[0], true);
}
}
internal LockObject RateLock = new LockObject("RateLock");
internal int accumulatedMsgTokens = 0;
internal int accumulatedByteTokens = 0;
internal int msgTokenRate = int.MaxValue;
internal int byteTokenRate = int.MaxValue;
internal int sleepDelay = 0;
internal Thread RateThread;
/// <summary>
/// Sets a rate limit for multicasts expressed as messages per second
/// </summary>
/// <param name="msgspersec">Limit on messages per time unit</param>
public void SetRateLimit(int msgspersec)
{
this.SetRateLimit(msgspersec, int.MaxValue, 1000);
}
/// <summary>
/// Sets a rate limit for multicasts in this group.
/// </summary>
/// <param name="msgspertimeunit">Limit on messages per time unit</param>
/// <param name="bytespertimeunit">Limit on bytes per time unit</param>
/// <param name="timeunit">Time unit for the limits in milliseconds</param>
public void SetRateLimit(int msgspertimeunit, int bytespertimeunit, int timeunit)
{
using (var tmpLockObj = new LockAndElevate(this.RateLock))
{
this.msgTokenRate = msgspertimeunit;
this.byteTokenRate = bytespertimeunit;
this.sleepDelay = Math.Max(timeunit, 50);
if (this.RateThread == null)
{
this.RateThread = new Thread(() =>
{
while (!VsyncSystem.VsyncActive)
{
Vsync.Sleep(250);
}
try
{
while (VsyncSystem.VsyncActive)
{
VsyncSystem.RTS.ThreadCntrs[11]++;
Monitor.Enter(this.RateLock);
if (this.msgTokenRate >= int.MaxValue / 2)
{
this.accumulatedMsgTokens = this.msgTokenRate;
}
else
{
this.accumulatedMsgTokens = (this.accumulatedMsgTokens / 2) + this.msgTokenRate;
}
if (this.byteTokenRate >= int.MaxValue / 2)
{
this.accumulatedByteTokens = this.byteTokenRate;
}
else
{
this.accumulatedByteTokens = (this.accumulatedByteTokens / 2) + this.byteTokenRate;
}
Monitor.PulseAll(this.RateLock);
Monitor.Exit(this.RateLock);
Vsync.Sleep(this.sleepDelay);
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "Group [" + this.gname + "]: Leaky bucket rate controller thread", IsBackground = true };
this.RateThread.Start();
}
}
}
// Implements the internal logic for the leaky bucket rate controller, supported optionally on a per-group basis
internal void RateLimit(int nb)
{
if (this.RateThread == null)
{
return;
}
Monitor.Enter(this.RateLock);
while (this.accumulatedMsgTokens == 0 || this.accumulatedByteTokens < nb)
{
Monitor.Wait(this.RateLock);
}
this.accumulatedMsgTokens--;
this.accumulatedByteTokens -= nb;
Monitor.Exit(this.RateLock);
}
private readonly List<KeyValuePair<int, long>> IdsByThreadId = new List<KeyValuePair<int, long>>();
private readonly LockObject IdsByThreadIdLock = new LockObject("IdsByThreadIdLock");
internal long newLoggingId()
{
long myId;
using (var tmpLockObj = new LockAndElevate(this.IdsByThreadIdLock))
{
myId = this.myLoggingId++;
foreach (KeyValuePair<int, long> kvp in this.IdsByThreadId)
{
if (kvp.Key == Thread.CurrentThread.ManagedThreadId)
{
this.IdsByThreadId.Remove(kvp);
break;
}
}
this.IdsByThreadId.Add(new KeyValuePair<int, long>(Thread.CurrentThread.ManagedThreadId, myId));
}
return myId;
}
internal long lookupLoggingId()
{
using (var tmpLockObj = new LockAndElevate(this.IdsByThreadIdLock))
{
foreach (KeyValuePair<int, long> kvp in this.IdsByThreadId)
{
if (kvp.Key == Thread.CurrentThread.ManagedThreadId)
{
return kvp.Value;
}
}
}
return -1;
}
internal volatile bool usesSubsetSend;
internal volatile bool receivedOrderedSends;
/// <summary>
/// A totally ordered Vsync multicast primitive. All members receive these messages in the same view and in the identical order.
/// </summary>
/// <param name="obs">A variable-length argument list specifying a request ID and a set of arguments that should match some handler.</param>
/// <remarks>
/// A totally ordered Vsync multicast primitive that can be extremely fast, particularly if sent by the group leader. OrderedSend is a <it>non-durable multicast</it>.
///
/// This non-durable ordered form of Send is often the right choice, as it offers total ordering but can be very fast, especially if sent from the group leader.
///
/// Basically, maps to an IP multicast, but delivery is then delayed until ordering information is specified by the leader (the leader gets a special
/// break: in that member, OrderedSend is performed using the cheapest Send because FIFO is all we need in that case).
/// Synchronized with respect to membership changes in accordance with the virtual synchrony model, which is a strong and
/// useful property. The first parameter is an identifier for the message handler to invoke (a small integer) and the remaining
/// are the arguments, which are objects of types known to Vsync. An exception will be thrown if the group doesn't have a message handler for
/// the specified request id, and matching the number and types of objects you specified.
///
/// Read more about the <it>virtual synchrony model</it> to learn about the options
/// and how to pick the fastest mechanism for the setting in which your application will run.
/// </remarks>
public void OrderedSend(params object[] obs)
{
if (this.receivedOrderedSends && this.UnstableCount == 0 && ReliableSender.PendingSendBuffer.Count == 0 && ReliableSender.P2PPendingSendBuffer.Count == 0 && this.ToDoCount == 0 && this.incomingSends.FullSlots() == 0)
{
// A conservative test: no unstable incoming messages, so everyone knows of any prior OrderedSends by other senders. And my
// PendingSendBuffer and lgPendingSendBuffer are empty, so everyone knows of anything I already sent.
// And... last but not least... I'm "caught up" for incoming messages...
this.receivedOrderedSends = false;
}
if (this.IAmRank0() && !this.receivedOrderedSends)
{
this._Send(true, obs);
return;
}
if (!this.VsyncCallStart())
{
return;
}
if (obs.Length > 1 && obs[0] is int && IsSubsetSend(obs[1]))
{
object[] newobs = new object[obs.Length + 1];
newobs[0] = obs[1];
for (int i = 0; i < obs.Length; i++)
{
newobs[i + 1] = obs[i];
}
obs = newobs;
}
this.ThreadCheck();
if (IsSubsetSend(obs[0]))
{
this.usesSubsetSend = true;
this.doOrderedSubsetSend(obs);
}
else
{
if (this.usesSubsetSend && (VsyncSystem.Debug & VsyncSystem.WARNABOUTSUBSETS) != 0)
{
Vsync.WriteLine("WARNING: In group <" + this.gname + "> application is mixing OrderedSubsetSends with OrderedSend. The relative order is not guaranteed");
}
this.OrderedSendWith((vid, mid, flag, m) => this.doSend(vid, mid, m.nRaw, flag, Vsync.ORDEREDSEND, m), obs);
}
this.VsyncCallDone();
}
private static bool IsSubsetSend(object obj)
{
Type t = obj.GetType();
return t == typeof(List<Address>) || t == typeof(Address[]) || (t.IsGenericType && t.GetGenericTypeDefinition() == typeof(QueryKey<>)) || (t == typeof(Msg) && ((Msg)obj).destList != null);
}
private void doOrderedSubsetSend(object[] obs)
{
bool copyDown = false;
List<Address> dests;
Type t = obs[0].GetType();
if (t == typeof(List<Address>))
{
dests = (List<Address>)obs[0];
copyDown = true;
}
else if (t == typeof(Msg) && ((Msg)obs[0]).destList != null)
{
dests = ((Msg)obs[0]).destList;
}
else if (t.IsGenericType && t.GetGenericTypeDefinition() == typeof(QueryKey<>))
{
dests = ((QKD)obs[0]).GetDests(this);
copyDown = true;
}
else
{
throw new VsyncException("doOrderedSubsetSend: can't identify the subset destination list");
}
if (copyDown)
{
object[] newObs = new object[obs.Length - 1];
for (int n = 0; n < obs.Length - 1; n++)
{
newObs[n] = obs[n + 1];
}
obs = newObs;
}
Msg m = new Msg(obs);
this.SetMsgIds(m, false, true);
List<long> ts = new List<long>();
List<Address> who = new List<Address>();
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("doOrderedSubsetSend[" + Address.VectorToString(dests.ToArray()) + "]... sending " + m.sender + "::" + m.vid + ":" + m.msgid + "... will wait for " + dests.Count + " responses");
}
FlowControl.FCBarrierCheck(dests);
int nr = this.doQuery(dests.Count, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT * 3 / 2, Timeout.TO_FAILURE), dests, Vsync.ORDEREDSEND, dests, this.myTS, m, EOL, ts, who);
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("doOrderedSubsetSend... " + m.sender + "::" + m.vid + ":" + m.msgid + "... got " + nr + " responses");
}
if (nr > 0)
{
long cts = ts[0];
Address cwho = who[0];
for (int n = 1; n < nr; n++)
{
if (ts[n] == cts)
{
cts = ts[n];
if (who[n].CompareTo(cwho) > 0)
{
cwho = who[n];
}
}
else if (ts[n] > cts)
{
cts = ts[n];
cwho = who[n];
}
}
if ((VsyncSystem.Debug & VsyncSystem.ORDEREDSEND) != 0)
{
Vsync.WriteLine("doOrderedSubsetSend... got " + nr + " responses for " + m.sender + "::" + m.vid + ":" + m.msgid);
Vsync.WriteLine("... send [" + Address.VectorToString(dests.ToArray()) + "] commit time @ " + cts + "::" + cwho);
}
if (cts > this.myTS)
{
this.myTS = cts;
}
this.doSend(false, false, dests, Vsync.ORDEREDSEND, m.sender, m.vid, m.msgid, cts, cwho);
}
}
private void OrderedSendWith(Action<int, int, bool, Msg> theSend, object[] obs)
{
if ((this.flags & G_ISLARGE) != 0)
{
this.Send(obs);
return;
}
if (this.myLoggingFcn != null)
{
this.myLoggingFcn(IL_ORDEREDSEND, IL_START, Vsync.my_address, this.newLoggingId(), obs);
}
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
if (obs.Length != 1 || obs[0].GetType() != typeof(Msg))
{
this.cbCheck(obs);
}
this.RateLimit(obs.Length);
FlowControl.FCBarrierCheck();
Msg m;
if (obs.Length == 1 && obs[0].GetType() == typeof(Msg))
{
m = (Msg)obs[0];
}
else
{
m = new Msg(obs);
try
{
ILock.NoteThreadState("Wedged(OrderedSendWidth).WaitOne()");
this.Wedged.WaitOne();
ILock.NoteThreadState(null);
using (var tmpLockObj = new LockAndElevate(this.CommitLock))
using (var tmpLockObj1 = new LockAndElevate(this.ViewLock))
{
m.vid = this.theView == null ? -1 : this.theView.viewid;
m.msgid = this.nextMsgid++;
m.nRaw = this.nRaw;
this.nRaw = 0;
}
}
finally
{
this.Wedged.Release();
}
}
m.dest = this.gaddr;
theSend(m.vid, m.msgid, false, m);
}
private delegate void MergeSafeSendReplies(int[] timestamp, Address[] who);
/// <summary>
/// The slowest but strongest of the Vsync multicast primitives, offering a totally ordered, durable multicast matching the Paxos semantics.
/// </summary>
/// <param name="obs">A variable-length argument list specifying a request ID and a set of arguments that should match some handler.</param>
/// <throws>VsyncSafeSendException if the group size drops below the SafeSendThreshold</throws>
/// <remarks>
/// The slowest but strongest of the Vsync multicast primitives, offering a totally ordered, durable multicast matching the Paxos semantics
/// and suitable for use in State Machine Replication or Transactional Database settings where the very strongest possible properties
/// are needed, even at substantial performance cost.
///
/// This form of Send is costly, and might not be the right choice: the properties are very powerful, but they aren't cheap.
/// The cost is even higher if SafeSend is used in a large group from any member except the rank-0 process (this cost could
/// be reduced but the needed code isn't trivial and this version of Vsync is just not optimized for SafeSend by non-rank0 members of large groups)
///
/// Before using SafeSend we recommend that you consider using the mostly-safe Send primitives, but then call Flush before replying
/// to the end-user. This will usually be just as safe and yet much faster for many purposes. We've written a paper on this topic.
///
/// Basically, maps to a two-phase commit: the first phase is via IP multicast, but safesendThreshold group members must ack before delivery occurs.
/// The default for safesendThreshold is ALL, but you can set a lower value using SetSafeSendThreshold to some other value, like 2 or 3.
/// The sender will count as one of the the acking members, so in fact Vsync actually waits for safesendThreshold-1 replies.
/// If a view change occurs, any pending SafeSend messages that have not yet been delivered are flushed (hence reach
/// all members) and then delivered as part of the new view COMMIT protocol.
///
/// The SafeSend primitive is synchronized with respect to membership changes in accordance with the virtual synchrony model, which is a strong and
/// useful property. The first parameter is an identifier for the message handler to invoke (a small integer) and the remai ning
/// are the arguments, which are objects of types known to Vsync. An exception will be thrown if the group doesn't have a message handler for
/// the specified request id, and matching the number and types of objects you specified.
///
/// Read more about the <it>virtual synchrony model</it> to learn about the options
/// and how to pick the fastest mechanism for the setting in which your application will run.
/// </remarks>
public void SafeSend(params object[] obs)
{
if (!this.VsyncCallStart())
{
return;
}
if ((this.flags & G_ISLARGE) != 0)
{
throw new VsyncException("Safesend: Not supported in large groups");
}
this.ThreadCheck();
if (this.myLoggingFcn != null)
{
this.myLoggingFcn(IL_SAFESEND, IL_START, Vsync.my_address, this.newLoggingId(), obs);
}
this.TypeCheck(obs);
this.RateLimit(obs.Length);
FlowControl.FCBarrierCheck();
this.doSafeSend(obs);
this.VsyncCallDone();
}
/// <summary>
/// This method is used to specify how many members of a group need to acknowledge a SafeSend multicast before it can be delivered.
/// For example, if the value specified is 3, then once a copy of the message is at 3 members, delivery occurs (even if the group has 1000 members).
/// </summary>
/// <param name="value">The number of members SafeSend will wait for</param>
/// <remarks>SafeSend waits for replies from the members ranked 0..SafeSendThreshold-1. This parameter functions exactly as does the Paxos
/// "number of acceptors" parameter (continuing the analogy, the full set of members are "listeners")</remarks>
public void SetSafeSendThreshold(int value)
{
this.safeSendThreshold = value;
}
internal int GetSafeSendThreshold()
{
View theView;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theView = this.theView;
}
if (this.HasFirstView)
{
if (this.safeSendThreshold == ALL)
{
return theView.members.Length;
}
if (theView.members.Length < this.safeSendThreshold)
{
return -1;
}
return this.safeSendThreshold;
}
return -1;
}
/// <summary>
/// Specifies a method that will be used to ensure the durability of a SafeSend. If not specified, in-memory caching at the acceptors is used
/// </summary>
/// <param name="theMethod">A method implementing the durabilityMethod API</param>
public void SetDurabilityMethod(durabilityMethod theMethod)
{
this.safeSendDurabilityMethod = theMethod;
}
private void doSafeSend(object[] obs)
{
if (this.theView.nLive() < this.safeSendThreshold)
{
throw new VsyncSafeSendException("Group<" + this.gname + ">: size dropped below SafeSend threshold=" + this.safeSendThreshold);
}
int maxTime = int.MinValue;
Address maxAddr = Vsync.my_address;
this.cbCheck(obs);
Msg m;
if (obs.Length == 1 && obs[0].GetType() == typeof(Msg))
{
m = (Msg)obs[0];
}
else
{
m = new Msg(obs);
try
{
ILock.NoteThreadState("Wedged(doSafeSend).WaitOne()");
this.Wedged.WaitOne();
ILock.NoteThreadState(null);
using (var tmpLockObj = new LockAndElevate(this.groupLock))
using (var tmpLockObj1 = new LockAndElevate(this.CommitLock))
{
m.vid = this.theView.viewid;
m.msgid = this.nextMsgid++;
m.nRaw = this.nRaw;
this.nRaw = 0;
}
}
finally
{
this.Wedged.Release();
}
}
// Timeout needs to be long because we might encounter the disklogger in the middle of garbage collection, which involves copying files and can take a while
Msg.InvokeFromBArrays(this.QueryToBAFromSystem(m.vid, m.msgid, m.nRaw, this.safeSendThreshold, new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT * 4, Timeout.TO_FAILURE, "SAFESEND"), Vsync.SAFESEND, Vsync.my_address, m.msgid, m), new MergeSafeSendReplies((timestamp, who) =>
{
if (timestamp.Length < this.safeSendThreshold)
{
throw new VsyncSafeSendException("Group<" + this.gname + ">: size dropped below SafeSend threshold=" + this.safeSendThreshold);
}
for (int n = 0; n < timestamp.Length; n++)
{
if (timestamp[n] > maxTime || (timestamp[n] == maxTime && who[n].GetHashCode() > maxAddr.GetHashCode()))
{
maxTime = timestamp[n];
maxAddr = who[n];
}
}
}));
// By using Send instead of doSend, this code deliberately allows the Send to be blocked if the group
// is wedged for a membership change (doSend ignores that kind of wedging). In such cases upon receipt,
// the corresponding message will already have been delivered. But we don't want flow control to kick in.
this.NonFlowControlledSend(Vsync.SAFEDELIVER, Vsync.my_address, m.msgid, maxTime, maxAddr);
}
/// <summary>
/// The cheapest and fastest of the Vsync multicast primitives, offering a FIFO (by sender) but otherwise unordered, non-durable multicast.
/// </summary>
/// <param name="obs">A variable-length argument list specifying a request ID and a set of arguments that should match some handler.</param>
/// <remarks>
/// The cheapest and fastest of the Vsync multicast primitives, offering a FIFO (by sender) but otherwise unordered, non-durable multicast.
///
/// This cheap form of Send is usually, but not always, the right choice. Read more about the <it>virtual synchrony model</it> to learn about the options
/// and how to pick the fastest mechanism for the setting in which your application will run.
///
/// Basically, maps to an IP multicast, but with reliability and ordering on a per-sender basis, and delivered as soon as it arrives.
/// Nonetheless, synchronized with respect to membership changes in accordance with the virtual synchrony model, which is a strong and
/// useful property. The first parameter is an identifier for the message handler to invoke (a small integer) and the remaining
/// are the arguments, which are objects of types known to Vsync. An exception will be thrown if the group doesn't have a message handler for
/// the specified request id, and matching the number and types of objects you specified.
/// </remarks>
public void Send(params object[] obs)
{
this._Send(true, obs);
}
private void NonFlowControlledSend(params object[] obs)
{
this._Send(false, obs);
}
private void _Send(bool okToBlock, object[] obs)
{
if (!this.VsyncCallStart())
{
return;
}
obs = subsetShortcut(obs);
if (okToBlock)
{
this.ThreadCheck();
}
if (this.myLoggingFcn != null)
{
this.myLoggingFcn(IL_SEND, IL_START, Vsync.my_address, this.newLoggingId(), obs);
}
if (okToBlock)
{
this.RateLimit(obs.Length);
}
List<Address> dests = null;
if (IsSubsetSend(obs[0]))
{
dests = GetDests(obs);
}
if (okToBlock)
{
FlowControl.FCBarrierCheck(dests);
}
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
this.doSend(false, false, obs);
this.VsyncCallDone();
}
private List<Address> GetDests(object[] obs)
{
List<Address> dests;
Type t = obs[0].GetType();
if (t == typeof(List<Address>))
{
dests = (List<Address>)obs[0];
}
else if (t == typeof(Address[]))
{
dests = ((Address[])obs[0]).ToList();
}
else if (t == typeof(QKD) || (t.IsGenericType && t.GetGenericTypeDefinition() == typeof(QueryKey<>)))
{
dests = ((QKD)obs[0]).GetDests(this);
}
else
{
dests = ((Msg)obs[0]).destList;
}
return dests;
}
private static object[] subsetShortcut(object[] obs)
{
if (obs.Length > 1 && obs[0] is int && IsSubsetSend(obs[1]))
{
object[] newobs = new object[obs.Length + 1];
newobs[0] = obs[1];
for (int i = 0; i < obs.Length; i++)
{
newobs[i + 1] = obs[i];
}
obs = newobs;
}
return obs;
}
internal void ThreadCheck()
{
if (this != Vsync.ORACLE && this != Vsync.VSYNCMEMBERS && (!VsyncSystem.VsyncActive || VsyncSystem.VsyncRestarting || !this.GroupOpen))
{
throw new VsyncException("Group multicast operation was initiated before the system was fully operational or before the group Join completed");
}
if (Thread.CurrentThread == this.groupIPMCReaderThread)
{
Vsync.WriteLine("WARNING: <" + this.gname + ">.Send()/g.Query() or some multicast primitive was called from the incoming multicasts delivery thread (high risk of deadlocks)");
}
}
/// <summary>
/// An unreliable version of Send.
/// </summary>
/// <param name="obs">A variable-length argument list specifying a request ID and a set of arguments that should match some handler.</param>
/// <remarks>
/// The RawSend API is provided for advanced users only. The multicast is sent without retransmission in the event of failure and
/// is not acknowledged by the receiver. RawSend is currently disabled for Large Group cases (it maps to reliable Send).
///
/// Please note that when using RawSend at high rates, loss rates can spike unless the sender employs a non-zero value for the VSYNC_RATELIM parameter,
/// which limits the number of packets transmitted per second. See the user manual for details
/// </remarks>
public void RawSend(params object[] obs)
{
if (!this.VsyncCallStart())
{
return;
}
this.ThreadCheck();
if (this.myLoggingFcn != null)
{
this.myLoggingFcn(IL_SEND, IL_START, Vsync.my_address, this.newLoggingId(), obs);
}
this.RateLimit(obs.Length);
FlowControl.FCBarrierCheck();
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
this.doSend(false, (this.flags & G_ISLARGE) == 0, obs);
this.VsyncCallDone();
}
/// <summary>
/// An Vsync multicast that runs nearly as fast as Send, but with a stronger "casuality-preserving" ordering property.
/// </summary>
/// <param name="obs">A variable-length argument list specifying a request ID and a set of arguments that should match some handler.</param>
/// <remarks>
/// Imagine that Send(X) is performed by member A, and then member B, receiving X, issues Send(Y). With the FIFO property of Send, nothing prevents
/// member C from receiving Y before X! CausalSend guarantees that if X "happens before" Y, then CausalSend(X) will be delivered before CausalSend(Y)
///
/// For most purposes, Send or Causal send is the best choice, followed by a call to Flush() prior to responding to external clients.
/// Read more about the <it>virtual synchrony model</it> to learn about the options
/// and how to pick the fastest mechanism for the setting in which your application will run.
///
/// Send and CausalSend both map to an IP multicast or a tunneled application-implemented multicast, and CausalSend will usually be delivered as soon as it arrives.
/// CausalSend is synchronized with respect to membership changes in accordance with the virtual synchrony model, which is a strong and
/// useful property. The first parameter is an identifier for the message handler to invoke (a small integer) and the remaining
/// are the arguments, which are objects of types known to Vsync. An exception will be thrown if the group doesn't have a message handler for
/// the specified request id, and matching the number and types of objects you specified.
/// </remarks>
public void CausalSend(params object[] obs)
{
if (!this.VsyncCallStart())
{
return;
}
this.ThreadCheck();
if ((this.flags & G_ISLARGE) != 0)
{
this.Send(obs);
}
else if (this.theView.members.Length > 5)
{
// In larger groups the VT scheme doesn't work well; use OrderedSend instead
this.OrderedSend(obs);
}
else
{
int[] myVT;
int theVid;
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
theVid = this.theView.viewid;
myVT = new int[this.theView.myVT.Length];
Buffer.BlockCopy(this.theView.myVT, 0, myVT, 0, Buffer.ByteLength(this.theView.myVT));
myVT[this.theView.GetMyRank()] = this.theView.myClock++;
}
if ((VsyncSystem.Debug & VsyncSystem.CAUSALDELIVERY) != 0)
{
Vsync.WriteLine("Sending a causal multicast in vid=" + theVid + ", with VT=" + VTtoString(myVT));
}
this.Send(Vsync.CAUSALSEND, theVid, myVT, new Msg(obs));
}
this.VsyncCallDone();
}
internal static void doMultiSend(List<Group> glist, bool fromOracle, params object[] obs)
{
foreach (Group g in glist)
{
g.RateLimit(obs.Length);
g.TypeCheck(obs);
g.doSend(fromOracle, false, obs);
}
}
private readonly LockObject RelayedLGSendsLock = new LockObject("RelayedLGSendsLock");
private List<Msg> RelayedLGSends = new List<Msg>();
private Address prevLGOwner;
private bool LGSetupDone;
internal string LGRelayGetState()
{
string s = string.Empty;
using (var tmpLockObj = new LockAndElevate(this.RelayedLGSendsLock))
{
if (this.RelayedLGSends.Count == 0)
{
return s;
}
s = "LG Relayed Sends queue:" + Environment.NewLine;
foreach (Msg m in this.RelayedLGSends)
{
s += " -- " + m + Environment.NewLine;
}
}
return s;
}
internal void LGSetup()
{
if (this.LGSetupDone)
{
return;
}
this.LGSetupDone = true;
tokenInfo outerToken;
using (var tmpLockObj = new LockAndElevate(this.TokenLock))
{
outerToken = this.theToken;
}
if (outerToken != null)
{
this.prevLGOwner = outerToken.groupOwner;
}
// Note that the peek-ahead logic for messages watches for RELAYSEND and expects the ViewDelta vector to be the first (and only) argument... If this is changed, that also needs to change
this.doRegister(Vsync.RELAYSEND, new Action<Vsync.ViewDelta[]>(vds =>
{
tokenInfo theToken;
View theView = null;
if (!this.HasFirstView)
{
using (var tmpLockObj = new LockAndElevate(this.CommitLock))
{
Vsync.CommitGVUpdates(this, vds, ref theView);
}
}
using (var tmpLockObj = new LockAndElevate(this.TokenLock))
using (var tmpLockObj1 = new LockAndElevate(this.ViewLock))
{
theToken = this.theToken;
theView = this.theView;
}
vds = vds.Select(vd => vd).Where(vd => vd.gaddr == this.gaddr && vd.prevVid >= (theView == null ? -1 : theView.viewid)).ToArray();
if (vds.Length == 0)
{
return;
}
if ((VsyncSystem.Debug & (VsyncSystem.RELAYLOGIC | VsyncSystem.TOKENLOGIC)) != 0)
{
Vsync.WriteLine("Large Group owner relaying COMMIT for a vector of view deltas in <" + this.gname + ">");
foreach (Vsync.ViewDelta vd in vds)
{
Vsync.WriteLine(" " + vd);
}
}
if (theView == null || theToken == null)
{
throw new VsyncException("Large group had a total failure just as I was joining.");
}
theToken.applyViewDeltas(this, vds);
if (this.myFirstLeadershipView == 0)
{
return;
}
this.becomeGroupOwner();
foreach (Vsync.ViewDelta vd in vds)
{
if (vd.prevVid >= theView.viewid)
{
if ((VsyncSystem.Debug & (VsyncSystem.GROUPEVENTS | VsyncSystem.RELAYLOGIC | VsyncSystem.TOKENLOGIC | VsyncSystem.TOKENFLUSH)) != 0)
{
Vsync.WriteLine("In RELAYSEND relaying COMMIT in <" + this.gname + "> for ViewDelta " + vd);
}
using (var tmpLockObj = new LockAndElevate(theToken.slock))
{
if (theToken.unstableVIDMID == 0)
{
theToken.unstableVIDMID = this.nextMsgid;
theToken.unstableVID = theToken.viewid;
}
}
this.doSend(false, false, Vsync.COMMIT, vds);
break;
}
}
List<Msg> toRelay;
using (var tmpLockObj = new LockAndElevate(this.RelayedLGSendsLock))
{
toRelay = this.RelayedLGSends;
this.RelayedLGSends = new List<Msg>();
}
foreach (Msg m in toRelay)
{
this.doSend(false, false, m);
}
// Asynchronously send off a message to the ORACLE: I'm stable up to whatever the current viewid shows
int stableVID;
using (var tmpLockObj = new LockAndElevate(theToken.slock))
{
stableVID = theToken.stableVID;
}
Vsync.ORACLE.doSend(false, false, Vsync.ISSTABLE, this.gaddr, stableVID);
}));
// Register the flush aggregator for this group
this.doRegisterAggregator((FlushAggKey key, bool lV, bool dV) => lV && dV);
}
internal void doUnorderedSend(params object[] obs)
{
if (obs == null || obs.Length < 1)
{
throw new ArgumentNullException("obs", "Vsync.Group.Cast");
}
if (obs.Length == 1 && obs[0].GetType() == typeof(object[]))
{
obs = (object[])obs[0];
}
this.doTheSend(false, false, false, Msg.UNORDERED, obs);
}
internal void doSendNotFromOracle(params object[] obs)
{
this.doSend(false, false, obs);
}
internal void doRawSendNotFromOracle(params object[] obs)
{
this.doSend(false, true, obs);
}
internal void doSendFromOracle(params object[] obs)
{
this.doSend(true, false, obs);
}
internal void doSend(bool isFromOracle, bool isRaw, params object[] obs)
{
if (obs == null || obs.Length < 1)
{
throw new ArgumentNullException("obs", "Vsync.Group.Cast");
}
if (obs.Length == 1 && obs[0].GetType() == typeof(object[]))
{
obs = (object[])obs[0];
}
this.doTheSend(isFromOracle, isRaw, false, isRaw ? Msg.RAWMULTICAST : Msg.MULTICAST, obs);
}
internal void doSend(int vid, int msgid, int nRaw, bool isFromOracle, params object[] obs)
{
if (obs == null || obs.Length < 1)
{
throw new ArgumentNullException("obs", "Vsync.Group.Cast");
}
if (obs.Length == 1 && obs[0].GetType() == typeof(object[]))
{
obs = (object[])obs[0];
}
this.doTheSend(vid, msgid, nRaw, isFromOracle, true, false, Msg.MULTICAST, obs);
}
internal void doSendRaw(params object[] obs)
{
if (obs == null || obs.Length < 1)
{
throw new ArgumentNullException("obs", "Vsync.Group.Cast");
}
if (obs.Length == 1 && obs[0].GetType() == typeof(object[]))
{
obs = (object[])obs[0];
}
this.doTheSend(false, true, true, Msg.MULTICAST, obs);
}
private void doTheSend(bool sentByOracle, bool isRaw, bool isBeacon, byte type, object[] obs)
{
this.doTheSend(Msg.UNINITIALIZED, Msg.UNINITIALIZED, 0, sentByOracle, isRaw, isBeacon, type, obs);
}
private void doTheSend(int vid, int msgid, int nRaw, bool sentByOracle, bool isRaw, bool isBeacon, byte type, object[] obs)
{
// Loops while it finds the group wedged (e.g. membership is changing), then does the requested send
int waitingTime = 0;
for (int retry = 0; waitingTime < Vsync.VSYNC_DEFAULTTIMEOUT * 8 && (VsyncSystem.VsyncActive || !VsyncSystem.VsyncWasActive); retry++)
{
if (retry > 0)
{
int howLong = retry < 10 ? 25 : 250;
Vsync.Sleep(howLong);
waitingTime += howLong;
}
// Perhaps non-ideal but the idea here is that from when we first assign message id's to a multicast until we send the
// last fragment, we hold the SIFLock, preventing anyone else from touching the multicast send id# counter
// The down side is that for Vsync, this is a fairly long-held lock and may be implicated in a multi-thread deadlock
// involving the CommitLock
using (var tmpLockObj = new LockAndElevate(this.SIFLock))
{
Msg m;
int xvid = -1;
using (var tmpLockObj1 = new LockAndElevate(this.ViewLock))
{
if (this.theView != null)
{
xvid = this.theView.viewid;
}
}
if (!isRaw && (this.flags & G_WEDGED) != 0)
{
if ((VsyncSystem.Debug & VsyncSystem.GROUPEVENTS) != 0)
{
Vsync.WriteLine("doTheSend: forced to loop because <" + this.gname + "> vid " + xvid + " is wedged");
}
continue;
}
if (obs[0].GetType() == typeof(Msg) && ((Msg)obs[0]).destList != null)
{
m = (Msg)obs[0];
foreach (Address dest in m.destList)
{
this.doP2PSend(dest, true, m);
}
return;
}
if (IsSubsetSend(obs[0]))
{
List<Address> dests = GetDests(obs);
obs = fixObs(obs);
this.cbCheck(obs);
m = new Msg(obs);
this.SetMsgIds(m, false, true);
foreach (Address dest in dests)
{
this.doP2PSend(dest, true, m);
}
return;
}
if ((VsyncSystem.Debug & VsyncSystem.MESSAGELAYER) != 0)
{
Vsync.WriteLine("doSend: GroupOpen " + this.GroupOpen + ", sendByOracle=" + sentByOracle + ", isRaw=" + isBeacon + ", type=" + type);
}
if (!this.GroupOpen && !isBeacon)
{
return;
}
if (obs.Length == 1 && obs[0].GetType() == typeof(Msg))
{
m = (Msg)obs[0];
}
else
{
this.cbCheck(obs);
m = new Msg(obs);
}
if (sentByOracle)
{
m.flags |= Msg.SENTBYORACLE;
}
if ((this.flags & G_ISLARGE) != 0)
{
tokenInfo theToken;
using (var tmpLockObj1 = new LockAndElevate(this.TokenLock))
{
theToken = this.theToken;
}
if (theToken != null && !theToken.IAmLgOwner)
{
if (m.vid != -1 && m.msgid != -1)
{
using (var tmpLockObj2 = new LockAndElevate(this.RelayedLGSendsLock))
{
this.RelayedLGSends.Add(m);
}
}
this.doP2PSend(theToken.groupOwner, true, Vsync.RELAYSEND, m);
return;
}
}
if (isBeacon)
{
// Special, used currently only by ORACLE BeaconTask
m.vid = 0;
m.msgid = -1;
}
else if (vid != Msg.UNINITIALIZED)
{
m.vid = vid;
m.msgid = msgid;
m.nRaw = nRaw;
}
else if (m.vid == Msg.UNINITIALIZED)
{
this.SetMsgIds(m, sentByOracle, isRaw);
}
if ((this.flags & G_SECURE) != 0 && (type == Msg.ISGRPP2P || type == Msg.ISRAWGRPP2P || type == Msg.MULTICAST || type == Msg.RAWMULTICAST || type == Msg.UNORDERED || type == Msg.ISREPLY))
{
this.cipherMsg(m);
}
if ((VsyncSystem.Debug & (VsyncSystem.MESSAGELAYER | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("ReliableSender.SendGroup to <" + this.gname + ">... type=" + Msg.mtypes[type] + ", Msg=" + m);
}
ReliableSender.SendGroup(type, this, m.vid, m.msgid, (byte)(m.flags & ~Msg.CIPHER), Msg.toBArray(m), m.nRaw);
}
return;
}
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("doTheSend");
}
// NOTE: The timeout has to be long because (1) flush can be slow, and worse than that, someone may have just
// crashed without a graceful shutdown; if so, detecting the failure will take something like 2*VSYNC_DEFAULTTIMEOUT time
throw new VsyncException("doTheSend: Group stuck in a wedged state for " + (waitingTime / 1000) + " seconds");
}
internal void SetMsgIds(Msg m, bool sentByOracle, bool isRaw)
{
tokenInfo theToken;
using (var tmpLockObj = new LockAndElevate(this.TokenLock))
{
theToken = this.theToken;
}
if (this.myLoggingFcn != null)
{
m.Lid = this.lookupLoggingId();
}
if ((this.flags & G_ISLARGE) != 0)
{
if (theToken != null)
{
ILock lgb;
using (var tmpLockObj = new LockAndElevate(theToken.FlushingBarrierLock))
{
lgb = theToken.FlushingBarrier;
}
if (lgb != null)
{
if ((VsyncSystem.Debug & VsyncSystem.FLUSHING) != 0)
{
Vsync.WriteLine("Before BarrierWait in SetMsgIds");
}
lgb.BarrierWait();
if ((VsyncSystem.Debug & VsyncSystem.FLUSHING) != 0)
{
Vsync.WriteLine("After BarrierWait in SetMsgIds");
}
}
}
}
string which = "?";
if (!sentByOracle || this == Vsync.ORACLE)
{
if (this.HasFirstView)
{
// Blocks if group view is currently changing
try
{
// Once we've assigned a message id, this member is committed to delivering this message in the current view
ILock.NoteThreadState("Wedged(SetMsgIds).WaitOne()");
this.Wedged.WaitOne();
ILock.NoteThreadState(null);
using (var tmpLockObj = new LockAndElevate(this.CommitLock))
using (var tmpLockObj1 = new LockAndElevate(this.ViewLock))
{
m.vid = this.theView.viewid;
m.msgid = this.nextMsgid++;
// Each message tells how many raw messages, sent in a row, it immediately follows
if (isRaw)
{
m.nRaw = this.nRaw++;
}
else
{
m.nRaw = this.nRaw;
this.nRaw = 0;
}
}
which = "HasFirstView";
}
finally
{
this.Wedged.Release();
}
}
else
{
m.vid = 0;
m.msgid = -1;
which = "NoFirstView";
}
}
else if ((this.flags & G_ISLARGE) == 0)
{
Group tp = TrackingProxyLookup(this.gaddr);
if (tp != null && tp.HasFirstView)
{
using (var tmpLockObj = new LockAndElevate(this.CommitLock))
using (var tmpLockObj1 = new LockAndElevate(tp.ViewLock))
{
m.vid = tp.theView.viewid;
m.msgid = tp.nextMsgid++;
m.nRaw = this.nRaw;
this.nRaw = 0;
}
which = "TP/HasFirstView";
}
else
{
m.vid = 0;
m.msgid = -1;
which = "TP/NoFirstView";
}
}
else
{
throw new VsyncException("Oracle is trying to send in a large group: <" + this.gname + ">");
}
m.dest = this.gaddr;
if ((VsyncSystem.Debug & VsyncSystem.MSGIDS) != 0)
{
Vsync.WriteLine("Set msgids(SentByOracle=" + sentByOracle + ") " + which + ": " + m);
}
}
/// <summary>
/// Flushes any unstable messages, pauses until the operation completes, then returns.
/// </summary>
/// <remarks>
/// When using the non-Paxos multicast and query options, conditions can arise in which the user needs to know that past operations
/// on the group have stabilized before some next action (such as checkpointing the group) can occur. For this, call Flush.
/// To learn more, read about the <it>virtual synchrony model</it> as implemented by Vsync.
/// </remarks>
public void Flush()
{
this.Flush(int.MaxValue);
}
/// <summary>
/// Flushes unstable messages until k copies have definitely reached their destinations
/// </summary>
/// <param name="k">A "stability threshold" similar to the Paxos stability threshold parameter.</param>
public void Flush(int k)
{
if (!VsyncSystem.VsyncActive)
{
return;
}
checkGroupIsOpen(this);
if ((this.flags & G_ISLARGE) == 0 && k > 0)
{
ReliableSender.waitForStability(k);
}
else
{
this.doFlush(k, new Vsync.UnstableList[0], new Address[0]);
}
}
private static void checkGroupIsOpen(object g)
{
if (g == null)
{
docheckGroup(null);
}
else if (g.GetType() == typeof(Group))
{
docheckGroup((Group)g);
}
else if (g.GetType() == typeof(List<Group>))
{
foreach (Group grp in (List<Group>)g)
{
docheckGroup(grp);
}
}
}
private static void docheckGroup(Group g)
{
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
if (g != null && !g.GroupOpen && g.WasOpen)
{
throw new VsyncException("An operation was attempted on group <" + g.gname + ">, but this group is closed");
}
}
internal void doFlush(int k, Vsync.UnstableList[] usl, Address[] leaving)
{
using (Semaphore CPSSema = new Semaphore(0, int.MaxValue))
{
List<Msg> mustSend;
int cpscnt;
this.startFlush(k, usl, out mustSend, out cpscnt, CPSSema);
this.endFlush(leaving, mustSend, cpscnt, CPSSema);
}
}
internal void startFlush(int k, Vsync.UnstableList[] usl, out List<Msg> mustSend, out int cpscnt, Semaphore CPSSema)
{
mustSend = new List<Msg>();
using (var tmpLockObj = new LockAndElevate(this.UnstableLock))
{
foreach (Vsync.UnstableList us in usl)
{
if (us.flusher == Vsync.my_address)
{
foreach (Msg m in this.Unstable)
{
if (m.gaddr == us.gaddr && m.sender == us.sender && m.vid == us.vid && m.msgid >= us.mid_low && m.msgid <= us.mid_hi)
{
this.Unstable.Remove(m);
this.UnstableCount--;
mustSend.Add(m);
break;
}
}
}
}
}
cpscnt = 0;
using (var tmpLockObj = new LockAndElevate(ReliableSender.PendingSendBufferLock))
{
foreach (ReliableSender.MsgDesc md in ReliableSender.PendingSendBuffer)
{
if (md.group == this)
{
lock (md)
{
if (md.CPSList != null)
{
md.CPSList.Add(CPSSema);
++cpscnt;
}
}
}
}
foreach (ReliableSender.MsgDesc md in ReliableSender.P2PPendingSendBuffer)
{
if (md.group == this)
{
lock (md)
{
if (md.CPSList != null)
{
md.CPSList.Add(CPSSema);
++cpscnt;
}
}
}
}
foreach (ReliableSender.MsgDesc lgmd in ReliableSender.LgPendingSendBuffer)
{
if (lgmd.group == this)
{
lock (lgmd)
{
if (lgmd.CPSList != null)
{
lgmd.CPSList.Add(CPSSema);
++cpscnt;
}
}
}
}
}
}
internal void endFlush(Address[] leaving, List<Msg> mustSend, int cpscnt, Semaphore CPSSema)
{
foreach (Msg m in mustSend)
{
if ((this.flags & G_SECURE) != 0)
{
this.cipherMsg(m);
}
ReliableSender.SendGroup(m.type, this, m, m.sender.isMyAddress());
}
ReliableSender.PendingSendCleanup(leaving);
ReliableSender.CompletePendingSends(this, cpscnt, CPSSema);
}
internal class FragInfo
{
internal Address sender;
internal int fragId;
internal long fragTime;
internal bool[] gotFrag;
internal bool iscomp;
internal bool isRaw;
internal int nFragsRemaining;
internal byte[] body;
internal Msg TrueMsg;
internal LockObject Lock = new LockObject("FragInfo.Lock");
internal FragInfo(Address s, int fid, long tl, int nf, bool ic, bool ir)
{
this.fragTime = Vsync.NOW;
this.sender = s;
this.fragId = fid;
this.body = new byte[tl];
this.nFragsRemaining = nf;
this.gotFrag = new bool[nf];
this.iscomp = ic;
this.isRaw = ir;
if (!Vsync.BigTimeouts && tl > 500000)
{
// Switch to big-timeout mode if incoming object is larger than 500KB
Vsync.BigTimeouts = true;
Vsync.VSYNC_DEFAULTTIMEOUT = Vsync.VSYNC_DEFAULTTIMEOUT * 2;
}
}
}
internal static string deFragState()
{
string s = "Defragmentation in progress:" + Environment.NewLine;
using (var tmpLockObj = new LockAndElevate(dfLock))
{
if (dfList.Count == 0)
{
return string.Empty;
}
foreach (FragInfo fi in dfList)
{
using (var tmpLockObj1 = new LockAndElevate(fi.Lock))
{
string got = " ";
foreach (bool b in fi.gotFrag)
{
got += b ? "+ " : "- ";
}
s += " [sender=" + fi.sender + ", gaddr=" + (fi.TrueMsg != null && fi.TrueMsg.gaddr != null ? fi.TrueMsg.gaddr.ToString() : "null") + ", fragId=" + fi.fragId + ", nFragsRemaining=" + fi.nFragsRemaining + ", got={" + got + "}, final length will be " + fi.body.Length + "] reconstruction underway for " + (Vsync.NOW - fi.fragTime) + "ms" + Environment.NewLine;
}
}
return s;
}
}
internal static LockObject dfLock = new LockObject("dfLock");
internal static LockObject sendInFragsLock = new LockObject("sendInFragsLock");
internal static List<FragInfo> dfList = new List<FragInfo>();
internal static int nextFid;
internal static void fiCleanup(Address gaddr)
{
using (var tmpLockObj = new LockAndElevate(dfLock))
{
if (dfList.Count == 0)
{
return;
}
List<FragInfo> ndflist = new List<FragInfo>();
foreach (FragInfo fi in dfList)
{
if (fi.TrueMsg == null || (fi.TrueMsg.gaddr != null && fi.TrueMsg.gaddr != gaddr))
{
ndflist.Add(fi);
}
}
dfList = ndflist;
}
}
internal static void fiCleanup()
{
using (var tmpLockObj = new LockAndElevate(dfLock))
{
if (dfList.Count == 0)
{
return;
}
List<FragInfo> ndflist = new List<FragInfo>();
foreach (FragInfo fi in dfList)
{
// Raw packets time out rapidly since the fragments might never be retransmitted. Reliable packets, in contrast, linger a long time since
// in principle, any missing fragments will be resent "soon"
if (fi.isRaw ? (Vsync.NOW - fi.fragTime < Math.Max(500, fi.gotFrag.Length * 25)) : (Vsync.NOW - fi.fragTime < Math.Max(15000, fi.gotFrag.Length * 250)))
{
ndflist.Add(fi);
}
}
dfList = ndflist;
}
}
internal static FragInfo deFragLookup(Group g, Address sender, int fid, long tl, int nf, bool iscomp, bool isRaw)
{
using (var tmpLockObj = new LockAndElevate(dfLock))
{
foreach (FragInfo dfi in dfList)
{
if (dfi.sender == sender && dfi.fragId == fid)
{
return dfi;
}
}
}
using (var tmpLockObj = new LockAndElevate(Vsync.RIPLock))
{
if (Vsync.RIPList.Contains(sender))
{
return null;
}
}
FragInfo fi = new FragInfo(sender, fid, tl, nf, iscomp, isRaw);
using (var tmpLockObj = new LockAndElevate(dfLock))
{
dfList.Add(fi);
}
return fi;
}
internal const byte FFIC = 0x01;
internal const byte FFIR = 0x01;
internal static byte fflags(bool ic, bool isRaw)
{
byte rv = 0;
if (ic)
{
rv |= FFIC;
}
if (isRaw)
{
rv |= FFIR;
}
return rv;
}
internal static bool ffic(byte ff)
{
return (ff & FFIC) != 0;
}
internal static bool ffir(byte ff)
{
return (ff & FFIR) != 0;
}
internal static void deFragGotFrag(Group g, Address sender, int fragId, long trueLen, int nFrags, int fragN, byte fflags, byte[] frag)
{
FragInfo fi = deFragLookup(g, sender, fragId, trueLen, nFrags, ffic(fflags), ffir(fflags));
if ((VsyncSystem.Debug & VsyncSystem.FRAGER) != 0)
{
Vsync.WriteLine("deFragmenter got a fragment for a " + trueLen + " byte object; senderId=" + Vsync.my_address + ", fid=" + fragId + ", this was fragment " + fragN + (fi == null ? " (** SENDER ON RIP LIST: IGNORING **)" : string.Empty));
}
if (fi == null)
{
return;
}
bool doDelivery = false;
List<FragInfo> toRemove = new List<FragInfo>();
using (var tmpLockObj = new LockAndElevate(fi.Lock))
{
if (fi.gotFrag[fragN])
{
return;
}
fi.gotFrag[fragN] = true;
Buffer.BlockCopy(frag, 0, fi.body, (int)(fragN * Vsync.VSYNC_FRAGLEN), Buffer.ByteLength(frag));
if (--fi.nFragsRemaining == 0)
{
if (fi.TrueMsg != null)
{
doDelivery = true;
}
else
{
toRemove.Add(fi);
}
}
}
using (var tmpLockObj = new LockAndElevate(dfLock))
{
foreach (FragInfo rfi in toRemove)
{
dfList.Remove(rfi);
}
}
if (doDelivery)
{
fragDoDelivery(g, fi);
}
}
internal static void deFragRdv(Group g, Msg m)
{
Msg outer = (Msg)Msg.BArrayToObjects(m.payload, typeof(Msg))[0];
object[] objs = Msg.BArrayToObjects(outer.payload, typeof(Address), typeof(int), typeof(long), typeof(int), typeof(byte));
int idx = 0;
Address sender = (Address)objs[idx++];
int fid = (int)objs[idx++];
long tl = (long)objs[idx++];
int nf = (int)objs[idx++];
bool ic = ffic((byte)objs[idx]);
bool ir = ffir((byte)objs[idx]);
FragInfo fi = deFragLookup(g, sender, fid, tl, nf, ic, ir);
if (fi == null)
{
return;
}
bool doDelivery = false;
using (var tmpLockObj = new LockAndElevate(fi.Lock))
{
if ((VsyncSystem.Debug & VsyncSystem.FRAGER) != 0)
{
Vsync.WriteLine("deFragmenter rdv: senderId=" + Vsync.my_address + ", fid=" + fid + (fi == null ? " (** SENDER ON RIP LIST: IGNORING **)" : string.Empty));
}
fi.TrueMsg = m;
if (fi.nFragsRemaining == 0)
{
doDelivery = true;
}
}
if (doDelivery)
{
fragDoDelivery(g, fi);
}
}
private static void fragDoDelivery(Group g, FragInfo fi)
{
Msg m;
using (var tmpLockObj = new LockAndElevate(dfLock))
{
dfList.Remove(fi);
}
if (fi.TrueMsg == null)
{
throw new VsyncException("Fragger: Rdv failure");
}
using (var tmpLockObj = new LockAndElevate(fi.Lock))
{
if (fi.iscomp)
{
fi.body = ReliableSender.DeCompress(fi.body);
}
m = new Msg(fi.body) { dest = fi.TrueMsg.dest, flags = (byte)((fi.TrueMsg.flags & ~Msg.FRAGGED) | Msg.DEFRAGGED), gaddr = fi.TrueMsg.gaddr, msgid = fi.TrueMsg.msgid, sender = fi.TrueMsg.sender, Lid = fi.TrueMsg.Lid, UID = -1, type = fi.TrueMsg.type, vid = fi.TrueMsg.vid, asReceived = fi.TrueMsg };
// Don't use the reassembled message to satisfy retransmission requests
// The "true message" will be used if responding to a NACK using a message on m.unstable
deFragDone(fi);
}
m.myObs = null;
if ((m.flags & Msg.TOKEN) != 0)
{
ReliableSender.gotToken(m);
}
else if ((m.flags & Msg.HASREPLY) != 0)
{
AwaitReplies.gotReply(m);
}
else
{
if (m.type == Msg.ISGRPP2P || m.type == Msg.ISRAWGRPP2P)
{
g.doAction(m);
}
else
{
g.doDeliveryCallbacks(m, "deFragGotFrag", Msg.MULTICAST);
}
}
}
internal static void deFragDone(FragInfo fi)
{
fi.body = null;
fi.TrueMsg = null;
}
internal static void deFragNoteFailure(Address who)
{
using (var tmpLockObj = new LockAndElevate(dfLock))
{
List<FragInfo> newList = new List<FragInfo>();
foreach (FragInfo fi in dfList)
{
if (fi.sender != who)
{
newList.Add(fi);
}
}
dfList = newList;
}
}
private static bool reEnteredSIF;
internal static byte[] SendInFrags(bool p2p, bool isRaw, Address dest, Group g, byte[] buffer, byte[] bufferAsGiven)
{
bool ic = bufferAsGiven != null;
using (var tmpLockObj = new LockAndElevate(sendInFragsLock))
{
if (reEnteredSIF)
{
throw new VsyncException("Recursive entry to SendInFrags");
}
reEnteredSIF = true;
int fid = ++nextFid;
long bl = buffer.Length;
int off = 0;
int nf = (int)((bl + Vsync.VSYNC_FRAGLEN - 1) / Vsync.VSYNC_FRAGLEN);
long wbl = bl;
for (int n = 0; n < nf; n++)
{
byte[] frag = new byte[Math.Min(wbl, Vsync.VSYNC_FRAGLEN)];
Buffer.BlockCopy(buffer, off, frag, 0, Buffer.ByteLength(frag));
wbl -= frag.Length;
off += frag.Length;
if (g != null)
{
// These next lines can't run through the flow-controlled version of P2PSend and Send because of the risk of a deadlock
// The core issue is that SendInFrags is currently a static method, since I use it both within groups and also for pure
// Vsync-Vsync communication outside of groups (for example, retransmission of a message on the unstable list). Thus if we allow
// flow control to lock send A, it may be that for the FC state to drain, message B needs to be retransmitted, and for this, fragmented
if (p2p)
{
if (isRaw)
{
g.RawP2PSend(dest, Vsync.FRAGMENT, Vsync.my_address, fid, (long)buffer.Length, nf, n, fflags(ic, isRaw), frag);
}
else
{
g.doP2PSend(dest, true, Vsync.FRAGMENT, Vsync.my_address, fid, (long)buffer.Length, nf, n, fflags(ic, isRaw), frag);
}
}
else
{
g.doSend(g == Vsync.ORACLE && dest != Vsync.ORACLE.gaddr, isRaw, Vsync.FRAGMENT, Vsync.my_address, fid, (long)buffer.Length, nf, n, fflags(ic, isRaw), frag);
}
}
else
{
throw new VsyncException("SendInFragments: can't fragment a p2p non-group message");
}
}
if ((VsyncSystem.Debug & VsyncSystem.FRAGER) != 0)
{
Vsync.WriteLine("Fragmented a " + bl + " byte object and sent it as " + nf + " fragments using senderId=" + Vsync.my_address + ", fid=" + fid);
}
// Surgery: replace the old message with a reference that will let us match it with the fragmented message
Msg outer = new Msg(bufferAsGiven ?? buffer);
Msg inner = Msg.InnerMsg(outer.payload);
if (inner == null)
{
// Special for fragmented tokens
inner = new Msg(Vsync.my_address, fid, bl, nf, fflags(ic, isRaw)) { dest = g.gaddr };
}
else
{
// Normal case: everything except tokens
inner.payload = Msg.toBArray(Vsync.my_address, fid, bl, nf, fflags(ic, isRaw));
inner.cipherPayload = null;
inner.flags &= ~Msg.CIPHER & 0xFF;
}
outer.payload = Msg.toBArray(inner);
outer.cipherPayload = null;
outer.flags &= ~Msg.CIPHER & 0xFF;
outer.myObs = inner.myObs = null;
reEnteredSIF = false;
return outer.toBArray();
}
}
internal byte[] cipherBuf(byte[] buffer)
{
if (this.myAes == null)
{
throw new VsyncException("<" + this.gname + ">: ciperBuf but myAES=null");
}
if (buffer.Length == 0)
{
throw new VsyncException("Buffer null in cipherBuf");
}
if (buffer.Length == 0)
{
return buffer;
}
return this.encipher(buffer);
}
internal byte[] CryptoWrap(byte[] buffer)
{
return Msg.toBArray(Vsync.CRYPTOWRAPPED, this.cipherBuf(buffer));
}
internal void cipherMsg(Msg m)
{
using (var tmpLockObj = new LockAndElevate(m.Lock))
{
if (m.vid < 0 || m.msgid < 0 || this.myAes == null || m.cipherPayload != null)
{
return;
}
m.myObs = null;
using (var tmpLockObj1 = new LockAndElevate(this.myAesLock))
{
m.cipherPayload = this.encipher(m.payload);
}
m.flags |= Msg.CIPHER;
}
}
internal byte[] decipherBuf(byte[] buffer)
{
if (this.myAes == null)
{
throw new VsyncException("<" + this.gname + ">: deciperBuf but myAES=null");
}
if (buffer.Length == 0)
{
return buffer;
}
return this.decipher(buffer);
}
internal void decipherMsg(Msg m)
{
if ((m.payload != null && m.cipherPayload != null) || (m.flags & Msg.CIPHER) == 0)
{
return;
}
using (var tmpLockObj = new LockAndElevate(m.Lock))
{
m.myObs = null;
if (m.cipherPayload == null && m.payload != null)
{
m.cipherPayload = m.payload;
}
using (var tmpLockObj1 = new LockAndElevate(this.myAesLock))
{
m.payload = this.decipher(m.cipherPayload);
}
}
}
private byte[] encipher(byte[] buffer)
{
int nb = this.myAes.BlockSize >> 3;
byte[] IV = new byte[nb];
this.AesSeed.GetBytes(IV);
return encipher(this, this.myAes, this.myAesLock, IV, buffer);
}
internal static byte[] encipher(Group g, Aes myAes, LockObject myAesLock, byte[] iv, byte[] buffer)
{
using (var tmpLockObj = new LockAndElevate(myAesLock))
{
int nb = myAes.BlockSize >> 3;
myAes.IV = iv;
if ((VsyncSystem.Debug & VsyncSystem.CIPHER) != 0)
{
tokenInfo.dumpBv((g != null ? ("<" + g.gname + "> ") : "<null> ") + "[encipher]myAes.Key=", myAes.Key);
tokenInfo.dumpBv((g != null ? ("<" + g.gname + "> ") : "<null> ") + "[encipher]myAes.IV=", myAes.IV);
Vsync.WriteLine((g != null ? ("<" + g.gname + "> ") : "<null> ") + "[encipher]Buffer length is " + buffer.Length);
}
int len = (((buffer.Length + 4) + (nb - 1)) / nb) * nb;
byte[] padded = new byte[len];
int idx = 0;
padded[idx++] = (byte)((buffer.Length >> 24) & 0xFF);
padded[idx++] = (byte)((buffer.Length >> 16) & 0xFF);
padded[idx++] = (byte)((buffer.Length >> 8) & 0xFF);
padded[idx++] = (byte)(buffer.Length & 0xFF);
while (idx - 4 < buffer.Length)
{
padded[idx] = buffer[idx - 4];
idx++;
}
while (idx < len - 4)
{
padded[idx++] = 0;
}
ICryptoTransform myEncryptor;
using (myEncryptor = myAes.CreateEncryptor(myAes.Key, myAes.IV))
{
MemoryStream msEncrypt = null;
try
{
msEncrypt = new MemoryStream();
CryptoStream csEncrypt = null;
try
{
csEncrypt = new CryptoStream(msEncrypt, myEncryptor, CryptoStreamMode.Write);
csEncrypt.Write(padded, 0, padded.Length);
byte[] ev = msEncrypt.ToArray();
byte[] result = new byte[iv.Length + ev.Length];
Buffer.BlockCopy(iv, 0, result, 0, Buffer.ByteLength(iv));
Buffer.BlockCopy(ev, 0, result, iv.Length, Buffer.ByteLength(ev));
return result;
}
finally
{
if (csEncrypt != null)
{
csEncrypt.Dispose();
msEncrypt = null;
}
}
}
finally
{
if (msEncrypt != null)
{
msEncrypt.Dispose();
}
}
}
}
}
// Caller has a lock on myAes
private byte[] decipher(byte[] buffer)
{
using (var tmpLockObj = new LockAndElevate(this.myAesLock))
{
int nb = this.myAes.BlockSize >> 3;
byte[] iv = new byte[nb];
Buffer.BlockCopy(buffer, 0, iv, 0, nb);
this.myAes.IV = iv;
if ((VsyncSystem.Debug & VsyncSystem.CIPHER) != 0)
{
tokenInfo.dumpBv("<" + this.gname + "> [decipher]myAes.Key=", this.myAes.Key);
tokenInfo.dumpBv("<" + this.gname + "> [decipher]myAes.IV=", this.myAes.IV);
}
using (this.myDecryptor = this.myAes.CreateDecryptor(this.myAes.Key, this.myAes.IV))
{
MemoryStream msDecrypt = null;
try
{
msDecrypt = new MemoryStream(buffer, nb, buffer.Length - nb);
CryptoStream csDecrypt = null;
try
{
csDecrypt = new CryptoStream(msDecrypt, this.myDecryptor, CryptoStreamMode.Read);
byte[] padded = new byte[this.myAes.BlockSize >> 3];
csDecrypt.Read(padded, 0, padded.Length);
int len = (padded[0] << 24) + (padded[1] << 16) + (padded[2] << 8) + padded[3];
if (len < 0 || len > Vsync.VSYNC_MAXMSGLENTOTAL)
{
Vsync.WriteLine("WARNING: Decryption failure (object may be corrupted, or may have been enciphered with a different key)");
return new byte[0];
}
if ((VsyncSystem.Debug & VsyncSystem.CIPHER) != 0)
{
Vsync.WriteLine("[decipher]Length will be " + len);
}
byte[] result = new byte[len];
int idx = 4;
int off = 0;
while (len-- > 0)
{
if (idx == padded.Length)
{
csDecrypt.Read(padded, idx = 0, padded.Length);
}
result[off++] = padded[idx++];
}
return result;
}
finally
{
if (csDecrypt != null)
{
csDecrypt.Dispose();
msDecrypt = null;
}
}
}
finally
{
if (msDecrypt != null)
{
msDecrypt.Dispose();
}
}
}
}
}
/// <summary>
/// Sends one message in a stream of messages that constitute a state transfer or checkpoint.
/// </summary>
/// <param name="obs">A variable-length list of arguments that must match one of the checkpoint loading methods</param>
/// <remarks>
/// Sends one message in a stream of messages that constitute a state transfer or checkpoint.
/// Each of these messages can contain any objects that the Vsync Msg layer is able to Marshall,
/// namely predefined types known to Vsync, or other object types that the user has registered via <see cref="Msg.RegisterType"/>.
/// The system matches each incoming checkpoint message with the list of registered handlers, invoking the handler(s) that have
/// exact matches with the types of the incoming message.
/// </remarks>
public void SendChkpt(params object[] obs)
{
object[] args = new object[obs.Length + 1];
args[0] = Vsync.STATEXFER;
for (int i = 0; i < obs.Length; i++)
{
args[i + 1] = obs[i];
}
if (this.myChkptStream == null)
{
foreach (Address a in this.nextView.joiners)
{
bool sendToHim = this.theChkptChoser == null || this.theChkptChoser(this.nextView, a);
if (sendToHim)
{
this.doP2PSend(a, true, args);
}
}
}
else
{
try
{
if (obs.Length > 0)
{
byte[] buffer = Msg.toBArray(obs);
if (this.userSpecifiedKey)
{
this.cipherBuf(buffer);
}
int len = buffer.Length;
byte[] lb = new byte[4];
lb[0] = (byte)(len & 0xFF);
len >>= 8;
lb[1] = (byte)(len & 0xFF);
len >>= 8;
lb[2] = (byte)(len & 0xFF);
len >>= 8;
lb[3] = (byte)(len & 0xFF);
this.myChkptStream.Write(lb, 0, 4);
this.myChkptStream.Write(buffer, 0, buffer.Length);
}
else
{
this.makingCheckpoint = false;
this.myChkptStream.Close();
this.myChkptStream = null;
if (!File.Exists(this.myCheckpointFile + ".chkpt"))
{
File.Create(this.myCheckpointFile + ".chkpt").Close();
}
if (!File.Exists(this.myCheckpointFile + ".bak"))
{
File.Create(this.myCheckpointFile + ".bak").Close();
}
bool replaced = false;
int tried = 0;
while (!replaced && tried++ < 3)
{
try
{
File.Replace(this.myCheckpointFile + ".tmp", this.myCheckpointFile + ".chkpt", this.myCheckpointFile + ".bak", false);
replaced = true;
}
catch (IOException)
{
if (tried == 3)
{
throw;
}
Vsync.Sleep(250);
}
}
}
}
catch (Exception e)
{
throw new VsyncException("I/O error: " + e.Message + " while writing checkpoint", e);
}
}
this.VsyncCallDone();
}
/// <summary>
/// Called as an end-of-state-transfer or end-of-checkpoint marker.
/// </summary>
public void EndOfChkpt()
{
if (!this.inhibitEOC)
{
this.SendChkpt();
}
this.VsyncCallDone();
}
/*private static string bprnt(byte[] buf)
{
string s = string.Empty;
for (int b = 0; b < buf.Length; b++)
{
s = s + " " + buf[b];
}
return s;
}*/
/// <summary>
/// Issues an unordered, non-durable point-to-point request to a designated member of some group and then waits for it to
/// reply.
/// </summary>
/// <param name="dest">the target for the query</param>
/// <param name="obs">A variable list of arguments that specify the request id and the arguments to some handler</param>
/// <returns>Return returned as a byte[] array</returns>
/// <remarks>
/// Issues an unordered, non-durable point-to-point request to a designated member of some group and then waits for it to
/// reply (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
/// The first parameter is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns a byte[] array in which the reply is encoded.
/// Normally, the user would employ Msg.BArraysToObjects() or Msg.InvokeFromBArrays() to decode these replies.
/// </remarks>
public byte[] P2PQueryToBarray(Address dest, params object[] obs)
{
if (!this.VsyncCallStart())
{
return null;
}
Timeout timeout;
splitObs(this, out timeout, ref obs);
this.cbCheck(obs);
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
byte[] rv = ReliableSender.QueryP2P(Msg.ISGRPP2P, dest, timeout, this, buffer);
this.VsyncCallDone();
return rv;
}
/// <summary>
/// Issues an unordered, non-durable point-to-point request to a designated member of some group and then waits for it to
/// reply.
/// </summary>
/// <param name="dest">the target for the query</param>
/// <param name="obs">A variable list of arguments that specify the request id and the arguments to some handler</param>
/// <returns>Returns result via user-supplied list objects</returns>
/// <remarks>
/// Issues an unordered, non-durable point-to-point request to a designated member of some group and then waits for it to
/// reply (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
/// The first parameter is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns a byte[] array in which the reply is encoded.
/// Normally, the user would employ Msg.BArraysToObjects() or Msg.InvokeFromBArrays() to decode these replies.
/// </remarks>
public bool P2PQuery(Address dest, params object[] obs)
{
if (!this.VsyncCallStart())
{
return false;
}
FlowControl.FCBarrierCheck();
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref obs, out resRefs);
this.cbCheck(obs);
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
buffer = ReliableSender.QueryP2P(Msg.ISGRPP2P, dest, timeout, this, buffer);
List<byte[]> barrays = new List<byte[]>();
if (buffer == null || buffer.Length == 0)
{
this.VsyncCallDone();
return false;
}
barrays.Add(buffer);
Msg.BArraysToLists(resRefs, barrays);
this.VsyncCallDone();
return true;
}
internal byte[] doP2PQuery(Address dest, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
this.cbCheck(obs);
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
return ReliableSender.QueryP2P(Msg.ISGRPP2P, dest, timeout, this, buffer);
}
internal byte[] doPureP2PQuery(Address dest, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
this.cbCheck(obs);
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
return ReliableSender.QueryP2P(Msg.ISPUREP2P, dest, timeout, this, buffer);
}
/// <summary>
/// Sends a point-to-point message to a specific member of the current group, specified via the dest field.
/// </summary>
/// <param name="dest">Which member of the group to send to</param>
/// <param name="obs">A variable list of arguments that specify the request id and the arguments to some handler</param>
/// <remarks>
/// Sends a point-to-point message to a specific member of the current group, specified via the dest field.
/// The
/// first parameter specified the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request.
/// </remarks>
public void P2PSend(Address dest, params object[] obs)
{
if (!this.VsyncCallStart())
{
return;
}
FlowControl.FCBarrierCheck();
this.cbCheck(obs);
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
ReliableSender.SendP2P(Msg.ISGRPP2P, dest, this, buffer, true);
this.VsyncCallDone();
}
internal void doP2PSend(Address dest, bool localSender, params object[] obs)
{
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
ReliableSender.SendP2P(Msg.ISGRPP2P, dest, this, buffer, localSender);
}
internal void doPureP2PSend(Address dest, bool localSender, params object[] obs)
{
byte[] buffer = Msg.toBArray(obs);
ReliableSender.SendP2P(Msg.ISPUREP2P, dest, null, buffer, localSender);
}
/// <summary>
/// Used to send an unreliable datagram. Best if the object size is well below the VSYNC_MAXMSGLEN
/// </summary>
/// <param name="dest">Target node within this group</param>
/// <param name="obs">Request code and parameters</param>
/// <remarks>
/// RawP2PSend is used to send an unreliable datagram, for example in support of a gossip-push protocol.
/// No attempt will be made to retransmit if the message is dropped.
/// Caution: if a very large object is sent this way, and some fragments are lost, resources will be tied up
/// for many seconds on the receiver until the garbage collection logic notices the unrepaired gap and discards
/// the partial packet. We recommend that SendRawP2P not be used for objects that exceed the VSYNC_MAXMSGLEN
/// limit (in fact, because your data will be encoded and might be enciphered, you need to limit your packets
/// to a considerably smaller size. In the worst case many hundreds of bytes of overhead may be added by
/// these mechanisms, and by the Msg serialization code)
///
/// Please note that when using RawP2PSend at high rates, loss rates can spike unless the sender employs a non-zero value for the VSYNC_RATELIM parameter,
/// which limits the number of packets transmitted per second. See the user manual for details
/// </remarks>
public void RawP2PSend(Address dest, params object[] obs)
{
if (!this.VsyncCallStart())
{
return;
}
FlowControl.FCBarrierCheck();
this.cbCheck(obs);
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
ReliableSender.SendP2P(Msg.ISRAWGRPP2P, dest, this, buffer, true);
this.VsyncCallDone();
}
/// <summary>
/// Used to send an unreliable query, to which the receiver will reply. Best if the object size is well below the VSYNC_MAXMSGLEN
/// </summary>
/// <param name="dest">Target node, within this group</param>
/// <param name="obs">Request code and parameters</param>
/// <remarks>
/// RawP2PQuery is used to send an unreliable query, for example in support of a gossip-pull or push-pull protocol.
/// No attempt will be made to retransmit if the message is dropped.
/// Caution: if a very large object is sent this way, and some fragments are lost, resources will be tied up
/// for many seconds on the receiver until the garbage collection logic notices the unrepaired gap and discards
/// the partial packet. We recommend that SendRawP2P not be used for objects that exceed the VSYNC_MAXMSGLEN
/// limit (in fact, because your data will be encoded and might be enciphered, you need to limit your packets
/// to a considerably smaller size. In the worst case many hundreds of bytes of overhead may be added by
/// these mechanisms, and by the Msg serialization code)
///
/// RawP2PQuery is useful in time-sensitive applications where a message that might be delayed and need to be
/// resent would be of low value because of the elapsed time. Often, one uses RawReply to respond to such a
/// Query, for the same reason.
///
/// If a timeout occurs, the specified timeout action will be taken
/// </remarks>
public bool RawP2PQuery(Address dest, params object[] obs)
{
if (!this.VsyncCallStart())
{
return false;
}
FlowControl.FCBarrierCheck();
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref obs, out resRefs);
this.cbCheck(obs);
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
byte[] buffer = Msg.toBArray(obs);
if ((this.flags & G_SECURE) != 0)
{
buffer = this.CryptoWrap(buffer);
}
buffer = ReliableSender.QueryP2P(Msg.ISRAWGRPP2P, dest, timeout, this, buffer);
List<byte[]> barrays = new List<byte[]>();
if (buffer == null || buffer.Length == 0)
{
this.VsyncCallDone();
return false;
}
barrays.Add(buffer);
Msg.BArraysToLists(resRefs, barrays);
this.VsyncCallDone();
return true;
}
/// <summary>
/// Used to send an unreliable datagram in reply to a RawP2PQuery. Best if the reply object size is well below the VSYNC_MAXMSGLEN
/// </summary>
/// <param name="obs">The reply data</param>
/// <remarks>
/// RawReply is used in conjunction with RawP2PQuery, and sends a reply unreliably: no attempt will be made to retransmit if the message is dropped.
/// Caution: if a very large object is sent this way, and some fragments are lost, resources will be tied up
/// for many seconds on the receiver until the garbage collection logic notices the unrepaired gap and discards
/// the partial packet. We recommend that SendRawP2P not be used for objects that exceed the VSYNC_MAXMSGLEN
/// limit (in fact, because your data will be encoded and might be enciphered, you need to limit your packets
/// to a considerably smaller size. In the worst case many hundreds of bytes of overhead may be added by
/// these mechanisms, and by the Msg serialization code)
///
/// Note that Reply() also works for RawP2PQuery: the query is sent unreliably but the reply will be acked. However, this mixture is uncommon.
/// The converse is also true: nothing prevents an application from using RawReply to respond to a Query sent reliably.
///
/// If the reply is dropped, the Query or RawQuery will eventually time out and take the indicated timeout action.
/// </remarks>
public void RawReply(params object[] obs)
{
if (!this.GroupOpen && this.WasOpen)
{
return;
}
if (!this.VsyncCallStart())
{
return;
}
Msg replyTo;
using (var tmpLockObj = new LockAndElevate(Rlock))
{
replyTo = this.getReplyToAndClear();
}
if (replyTo == null)
{
throw new VsyncException("Attempted to reply twice to same message, or to a message that wasn't a query");
}
if ((VsyncSystem.Debug & VsyncSystem.REPLYWAIT) != 0)
{
Vsync.WriteLine("Sending reply to " + replyTo.sender + ", " + replyTo.vid + ":" + replyTo.msgid);
}
byte[] result = Msg.toBArray(obs);
bool enciphered = false;
if ((this.flags & G_SECURE) != 0 && (replyTo.flags & Msg.ENCIPHEREDREPLY) != 0)
{
enciphered = true;
result = this.cipherBuf(result);
}
bool deliverToOracle = (replyTo.flags & Msg.SENTBYORACLE) != 0;
byte[] buffer = Msg.toBArray(RT_REPLY, replyTo.vid, replyTo.msgid, deliverToOracle, enciphered, result);
Vsync.PendingLeaderOps plos;
using (var tmpLockObj = new LockAndElevate(this.groupLock))
{
plos = this.NotifyDALOnReply;
}
if (plos != null && plos.reqMsg == replyTo)
{
Vsync.DALReplyNotify(this, new Msg(this.gaddr, Msg.ISREPLY, replyTo.sender, Msg.NewMsgAsBArray(Vsync.my_address, this.gaddr, this.theView.viewid, -1, 0L, 0, 0, 0, buffer), this.theView.viewid, -1), plos, replyTo);
}
ReliableSender.SendP2P(Msg.ISRAWREPLY, replyTo.sender, this.rgroup(replyTo.gaddr), this.theView == null ? 0 : this.theView.viewid, ReliableSender.P2PSequencer.NextP2PSeqn("reply/1", replyTo.sender), buffer, true, null, replyTo);
this.VsyncCallDone();
}
/// <exclude>
/// <summary>
/// Internal, declared public to satisfy C# scoping requirement.
/// </summary>
/// <param name="obs"></param>
/// </exclude>
public delegate void querySender(params object[] obs);
/// <summary>
/// Issues an unordered, non-durable multicast to the group and waits for replies, returning them as a byte[][] array, with one byte[] vector per reply.
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="timeout">Vsync.Timeout object specifying the timeout and action to take if it occurs before reply is received</param>
/// <param name="obs">variable-length list specifying method to invoke, arameters to method being invoked</param>
/// <returns>a byte[][] array with one byte[] vector per received reply</returns>
/// <remarks>
/// Issues an unordered, non-durable multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
///
/// This is the very fastest form of virtually synchronous query, but can be lost in the event of a failure. Typically performed as a single IP multicast with
/// instant invocation of the handler routine in the receiving processes. Read about the <it>virtual synchrony model</it> to learn more about when Query is a safe choice.
///
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns a byte[][] array in which each entry is a reply from one process.
/// Normally, the user would employ Msg.BArraysToObjects() or Msg.InvokeFromBArrays() to decode these replies.
/// </remarks>
public List<byte[]> QueryToBA(int nreplies, Timeout timeout, params object[] obs)
{
return this._Query(false, false, this.doSendNotFromOracle, nreplies, timeout, obs);
}
/// <summary>
/// Unreliable version of QueryToBA
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="timeout">timeout action</param>
/// <param name="obs">arguments to the call</param>
/// <returns>byte[][] array containing the replies</returns>
public List<byte[]> RawQueryToBA(int nreplies, Timeout timeout, params object[] obs)
{
return this._Query(false, true, this.doRawSendNotFromOracle, nreplies, timeout, obs);
}
private List<byte[]> QueryToBAFromSystem(int vid, int mid, int nRaw, int nreplies, Timeout timeout, params object[] obs)
{
return this._Query(vid, mid, nRaw, false, false, this.doSendNotFromOracle, nreplies, timeout, obs);
}
/// <summary>
/// Issues an unordered and reliable but potentially non-durable multicast to the group, then waits for replies
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="given">variable-length list specifying method to invoke, timeout, parameters to method being invoked, EOLMarker, vectors for received results</param>
/// <returns>how many replies were actually received</returns>
/// <remarks>
/// Issues an unordered and reliable but potentially non-durable multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
///
/// This is the very fastest way to query an Vsync group; it typically requires a single IP multicast per invocation. However, there are obscure failure patterns that
/// could cause it to be nondurable (the query is "lost" in the event of certain sequences of failures).
/// Read about the <it>virtual synchrony model</it> to learn more about when stronger Query variations would be needed.
///
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Counts the replies received and returns this number but the actual replies
/// are passed to the user in a series of vectors, which should be passed in as by-ref parameters after a marker, the Vsync EOLmarker, which separates
/// the list of arguments to the invoked method from the places to put Received replies.
///
/// For example: int[] hisAge = new int[0], nreps; nreps = Query(GET_AGE, 1, new Timeout(1000, TO_ABORTREPLY), "John Smith", Vsync.EOLmarker, hisAge);
/// </remarks>
public int Query(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.QueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
/// <summary>
/// Like Query, but uses RawSend
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="given">variable-length list specifying method to invoke, timeout, parameters to method being invoked, EOLMarker, vectors for received resultsvariable-length list specifying method to invoke, timeout, parameters to method being invoked, EOLMarker, vectors for received results</param>
/// <returns></returns>
public int RawQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.RawQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
internal int doQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.doQueryToBA(false, nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
internal static void splitObs(object g, out Timeout timeout, ref object[] obs, out object[] refs)
{
splitObs(g, out timeout, ref obs);
int nRefs = 0;
while (nRefs < obs.Length && obs[(obs.Length - 1) - nRefs].GetType() != typeof(EOLMarker))
{
++nRefs;
}
if (nRefs == obs.Length)
{
refs = new object[0];
return;
}
refs = new object[nRefs];
for (int i = 0; i < nRefs; i++)
{
try
{
refs[i] = obs[obs.Length - nRefs + i];
if (refs[i] == null)
{
throw new ArgumentException();
}
// At this point we used to complain if the length wasn't initially 0, but these days we allow
// "Accumulators" hence the exception is no longer thrown. But the logic above still is useful:
// an exception will be thrown if something is wrong (better to throw it when the call is first made rather than later..).
}
catch (Exception e)
{
throw new ArgumentException("In an Vsync Query, all reply variables must be lists", e);
}
}
Vsync.ArrayResize(ref obs, obs.Length - nRefs - 1);
}
internal static void splitObs(object g, out Timeout timeout, ref object[] obs)
{
checkGroupIsOpen(g);
if (obs[0].GetType() == typeof(Timeout))
{
timeout = (Timeout)obs[0];
for (int i = 0; i < obs.Length - 1; i++)
{
obs[i] = obs[i + 1];
}
Vsync.ArrayResize(ref obs, obs.Length - 1);
}
else
{
timeout = new Timeout(Vsync.VSYNC_DEFAULTTIMEOUT, Timeout.TO_ABORTREPLY);
}
obs = subsetShortcut(obs);
}
internal void QueryInvoke(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
Delegate del = (Delegate)obs[obs.Length - 1];
Vsync.ArrayResize(ref obs, obs.Length - 1);
Msg.InvokeFromBArrays(this.QueryToBA(nreplies, timeout, obs), del);
}
internal List<byte[]> doQueryToBA(bool sentByOracle, int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
querySender asyncSendQuery;
if (sentByOracle)
{
asyncSendQuery = this.doSendFromOracle;
}
else
{
asyncSendQuery = this.doSendNotFromOracle;
}
return this._Query(sentByOracle, false, asyncSendQuery, nreplies, timeout, obs);
}
internal void doQueryInvoke(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
Delegate del = (Delegate)obs[obs.Length - 1];
Vsync.ArrayResize(ref obs, obs.Length - 1);
Msg.InvokeFromBArrays(this.doQueryToBA(false, nreplies, timeout, obs), del);
}
/// <summary>
/// Issues an ordered but potentially non-durable multicast to the group and waits for replies
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="given">variable-length list specifying method to invoke, timeout, parameters to method being invoked, EOLMarker, vectors for received results</param>
/// <returns>how many replies were actually received</returns>
/// <remarks>
/// Issues an ordered but potentially non-durable multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
///
/// This is a slightly slower form of virtually synchronous query, and may not be durable be lost in the event of a failure. Can be done in a single IP multicast if
/// the sender was the group leader but if not, delivery will be delayed until the group leader decides and notifies members of the ordering to use.
/// Read about the <it>virtual synchrony model</it> to learn more about when OrderedQuery is needed.
///
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Counts the replies received and returns this number but the actual replies
/// are passed to the user in a series of vectors, which should be passed in as by-ref parameters after a marker, the Vsync EOLmarker, which separates
/// the list of arguments to the invoked method from the places to put Received replies.
///
/// For example: int[] hisAge = new int[0], nreps; nreps = OrderedQuery(GET_AGE, 1, new Timeout(1000, TO_ABORTREPLY), "John Smith", Vsync.EOLmarker, hisAge);
/// </remarks>
public int OrderedQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.OrderedQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
internal int doOrderedQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.doOrderedQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
/// <summary>
/// Issues a totally ordered (but not necessarily durable) multicast to the group and waits for replies.
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="obs">variable-length list specifying method to invoke, timeout, parameters to method being invoked</param>
/// <returns>a List contaiining one byte[] vector per received reply</returns>
/// <remarks>
/// Issues a totally ordered (but not necessarily durable) multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns a byte[][] array in which each entry is a reply from one process.
/// Normally, the user would employ Msg.BArraysToObjects() or Msg.InvokeFromBArrays() to decode these replies.
///
/// This is a slightly slower form of virtually synchronous query, and may not be durable be lost in the event of a failure. Can be done in a single IP multicast if
/// the sender was the group leader but if not, delivery will be delayed until the group leader decides and notifies members of the ordering to use.
/// Read about the <it>virtual synchrony model</it> to learn more about when OrderedQuery is needed.
/// </remarks>
public List<byte[]> OrderedQueryToBA(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
return this._Query(false, false, this.OrderedSend, nreplies, timeout, obs);
}
/// <summary>
/// Issues a causally ordered but potentially non-durable multicast to the group and waits for replies
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="given">variable-length list specifying method to invoke, timeout, parameters to method being invoked, EOLMarker, vectors for received results</param>
/// <returns>how many replies were actually received</returns>
/// <remarks>
/// Issues a causally ordered but potentially non-durable multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
///
/// This is a slightly slower form of virtually synchronous query, and may not be durable be lost in the event of a failure. Can be done in a single IP multicast
/// but delivery might be slightly delayed if the causal ordering would otherwise be violated.
/// Read about the <it>virtual synchrony model</it> to learn more about when CausalQuery is needed.
///
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Counts the replies received and returns this number but the actual replies
/// are passed to the user in a series of vectors, which should be passed in as by-ref parameters after a marker, the Vsync EOLmarker, which separates
/// the list of arguments to the invoked method from the places to put Received replies.
///
/// For example: int[] hisAge = new int[0], nreps; nreps = CausalQuery(GET_AGE, 1, new Timeout(1000, TO_ABORTREPLY), "John Smith", Vsync.EOLmarker, hisAge);
/// </remarks>
public int CausalQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.CausalQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
/// <summary>
/// Issues a causally ordered (but not necessarily durable) multicast to the group and waits for replies.
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="obs">variable-length list specifying method to invoke, timeout, parameters to method being invoked</param>
/// <returns>a List contaiining one byte[] vector per received reply</returns>
/// <remarks>
/// Issues a causally ordered (but not necessarily durable) multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns a byte[][] array in which each entry is a reply from one process.
/// Normally, the user would employ Msg.BArraysToObjects() or Msg.InvokeFromBArrays() to decode these replies.
///
/// This is a slightly slower form of virtually synchronous query, and may not be durable be lost in the event of a failure. Can be done in a single IP multicast
/// but delivery might be slightly delayed if the causal ordering would otherwise be violated.
/// Read about the <it>virtual synchrony model</it> to learn more about when CausalQuery is needed.
/// </remarks>
public List<byte[]> CausalQueryToBA(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
return this._Query(false, false, this.CausalSend, nreplies, timeout, obs);
}
internal List<byte[]> doOrderedQueryToBA(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
return this._Query(false, false, this.OrderedSend, nreplies, timeout, obs);
}
internal void doOrderedQueryInvoke(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
Delegate del = (Delegate)obs[obs.Length - 1];
Vsync.ArrayResize(ref obs, obs.Length - 1);
Msg.InvokeFromBArrays(this.doOrderedQueryToBA(nreplies, timeout, obs), del);
}
/// <summary>
/// Issues a Paxos-style (ordered, durable) multicast to the group and returns a byte[] vector encoding replies.
/// </summary>
/// <param name="nreplies">number of replies desired</param>
/// <param name="obs">variable-length list specifying method to invoke, timeout, parameters to method being invoked</param>
/// <returns>a List containing one byte[] vector per received reply</returns>
/// <remarks>
/// Issues a Paxos-style (ordered, durable) multicast to the group and returns a byte[] vector encoding replies.
/// waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
///
/// This is the slowest but most robust form of virtually synchronous query, and matches the State Machine Replication model. Requires a form of internal 2-phase commit, which
/// will involve round-trip acks from a majority of group members. Read about the <it>virtual synchrony model</it> to learn more about when SafeQuery is needed.
///
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns a List in which each entry is a reply from one process.
/// Normally, the user would employ Msg.BArraysToObjects() or Msg.InvokeFromBArrays() to decode these replies.
/// </remarks>
public List<byte[]> SafeQueryToBA(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
if (!VsyncSystem.VsyncActive)
{
throw new VsyncShutdownException("Vsync inactive");
}
return this._Query(false, false, this.SafeSend, nreplies, timeout, obs);
}
/// <summary>
/// Issues a Paxos-style (ordered, durable) multicast to the group and waits for replies
/// </summary>
/// <param name="nreplies">number of replies desired, or MAJORITY, or ALL</param>
/// <param name="given">variable-length list specifying method to invoke, timeout, parameters, EOLMarker, and result vectors</param>
/// <returns>number of replies actually received</returns>
/// <remarks>
/// Issues a Paxos-style (ordered, durable) multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
///
/// This is the slowest but most robust form of virtually synchronous query, and matches the State Machine Replication model. Requires a form of internal 2-phase commit, which
/// will involve round-trip acks from a majority of group members. Read about the <it>virtual synchrony model</it> to learn more about when SafeQuery is needed.
///
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns the number of replies received but discards their contents.
///
/// For example: int[] hisAge = new int[0], nreps; nreps = SafeQuery(GET_AGE, 1, new Timeout(1000, TO_ABORTREPLY), "John Smith", Vsync.EOLmarker, hisAge);
/// </remarks>
public int SafeQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.SafeQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
internal int doSafeQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.doSafeQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
internal void SafeQueryInvoke(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
Delegate del = (Delegate)obs[obs.Length - 1];
Vsync.ArrayResize(ref obs, obs.Length - 1);
Msg.InvokeFromBArrays(this.SafeQueryToBA(nreplies, timeout, obs), del);
}
internal List<byte[]> doSafeQueryToBA(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
return this._Query(false, false, this.SafeSend, nreplies, timeout, obs);
}
internal void doSafeQueryInvoke(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
Delegate del = (Delegate)obs[obs.Length - 1];
Vsync.ArrayResize(ref obs, obs.Length - 1);
Msg.InvokeFromBArrays(this.doSafeQueryToBA(nreplies, timeout, obs), del);
}
internal List<byte[]> doUnorderedQueryToBA(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
return this._Query(false, false, this.doUnorderedSend, nreplies, timeout, obs);
}
internal List<byte[]> UnorderedQueryToBA(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
return this._Query(false, false, this.doUnorderedSend, nreplies, timeout, obs);
}
/// <summary>
/// Issues an unordered multicast to the group and waits for replies
/// </summary>
/// <param name="nreplies">number of replies desired, or MAJORITY, or ALL</param>
/// <param name="given">variable-length list specifying method to invoke, timeout, parameters, EOLMarker, and result vectors</param>
/// <returns>number of replies actually received</returns>
/// <remarks>
/// Issues an unordered multicast to the group and waits for replies (<see cref="Reply"/>, <see cref="NullReply"/> and <see cref="AbortReply"/>).
/// The number of replies desired can be specified as an integer (normally 1 or 2), or as the special constant ALL. The first
/// of the parameter after nreplies is an object of type <see cref="Timeout"/> and specifies a timeout after which the Query ceases to wait for a non-responsive
/// member, and the default action to take in that case. The next parameter is the request handle: a small integer identifying this request.
/// Remaining parameters become typed arguments to the handler for the request. Returns the number of replies received but discards their contents.
/// </remarks>
public int UnorderedQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.UnorderedQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
internal int doUnorderedQuery(int nreplies, params object[] given)
{
Timeout timeout;
object[] resRefs;
splitObs(this, out timeout, ref given, out resRefs);
List<byte[]> barrays = this.doUnorderedQueryToBA(nreplies, timeout, given);
Msg.BArraysToLists(resRefs, barrays);
return barrays.Count;
}
internal void doUnorderedQueryInvoke(int nreplies, params object[] obs)
{
Timeout timeout;
splitObs(this, out timeout, ref obs);
Delegate del = (Delegate)obs[obs.Length - 1];
Vsync.ArrayResize(ref obs, obs.Length - 1);
Msg.InvokeFromBArrays(this.doUnorderedQueryToBA(nreplies, timeout, obs), del);
}
private List<byte[]> _Query(bool sentByOracle, bool isRaw, querySender AsyncSendQuery, int nreplies, Timeout timeout, object[] obs)
{
return this._Query(Msg.UNINITIALIZED, Msg.UNINITIALIZED, 0, sentByOracle, isRaw, AsyncSendQuery, nreplies, timeout, obs);
}
private List<byte[]> _Query(int vid, int mid, int nRaw, bool sentByOracle, bool isRaw, querySender AsyncSendQuery, int nreplies, Timeout timeout, object[] obs)
{
if (!this.VsyncCallStart())
{
return new List<byte[]>();
}
long mylid = 0;
if (this.myLoggingFcn != null)
{
this.myLoggingFcn(IL_QUERY, IL_START, Vsync.my_address, mylid = this.newLoggingId(), obs);
}
if (!this.GroupOpen)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWWAIT | VsyncSystem.REPLYWAIT)) != 0)
{
Vsync.WriteLine("Query to a group with GroupOpen==false: return new byte[0][]");
}
this.VsyncCallDone();
return new List<byte[]>();
}
if (obs == null || obs.Length < 1)
{
throw new ArgumentNullException("obs", "Vsync.Group.query");
}
Msg m;
if (obs.Length == 1 && obs[0].GetType() == typeof(Msg))
{
m = (Msg)obs[0];
}
else
{
object dests = null;
if (obs.Length == 1 && obs[0].GetType() == typeof(object[]) && ((object[])obs[0]).Length > 0 && IsSubsetSend(((object[])obs[0])[0]))
{
dests = GetDests((object[])obs[0]);
isRaw = true;
}
else if (obs.Length > 0 && IsSubsetSend(obs[0]))
{
dests = GetDests(obs);
obs = fixObs(obs);
isRaw = true;
}
this.cbCheck(obs);
m = new Msg(obs);
// Used to pass additional destination information to the doTheSend code that needs it
if (dests != null)
{
if (dests.GetType() == typeof(List<Address>))
{
m.destList = (List<Address>)dests;
}
else if (dests.GetType() == typeof(Address[]))
{
m.destList = ((Address[])dests).ToList();
}
else
{
m.destList = ((QKD)dests).GetDests(this);
}
}
}
if (vid != Msg.UNINITIALIZED)
{
m.vid = vid;
m.msgid = mid;
m.nRaw = nRaw;
}
else
{
this.SetMsgIds(m, sentByOracle, isRaw);
}
m.flags |= Msg.NEEDSREPLY;
if ((this.flags & G_SECURE) != 0)
{
m.flags |= Msg.ENCIPHEREDREPLY;
}
if (sentByOracle)
{
m.flags |= Msg.SENTBYORACLE;
}
if (nreplies == 0 || !this.HasFirstView)
{
if ((VsyncSystem.Debug & (VsyncSystem.VIEWWAIT | VsyncSystem.REPLYWAIT)) != 0)
{
Vsync.WriteLine("Query to group<" + this.gname + "> with nreplies=" + nreplies + ", and HasFirstView=" + this.HasFirstView + ": return new byte[0][]");
}
this.VsyncCallDone();
return new List<byte[]>();
}
if (nreplies == ALL)
{
nreplies = this.theView.members.Length;
}
else if (nreplies == MAJORITY)
{
nreplies = (this.theView.members.Length + 1) / 2;
}
nreplies = Math.Min(nreplies, this.theView.members.Length);
if ((VsyncSystem.Debug & (VsyncSystem.REPLYWAIT | VsyncSystem.VIEWCHANGE)) != 0)
{
Vsync.WriteLine("Registering a wait structure... group " + this.gname + ", wait-id " + m.vid + ":" + m.msgid + ", nreplies " + nreplies);
}
AwaitReplies.ReplyInfo ri = AwaitReplies.registerGroupWait(this, m.vid, m.msgid, m.destList, sentByOracle, (this.flags & G_SECURE) != 0, nreplies, timeout.when);
if ((VsyncSystem.Debug & (VsyncSystem.REPLYWAIT | VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("Sending my query: group " + this.gname + " msgid " + m.vid + ":" + m.msgid + ", ri.wanted=" + ri.replies_wanted);
}
AsyncSendQuery(m);
if ((VsyncSystem.Debug & (VsyncSystem.REPLYWAIT | VsyncSystem.VIEWCHANGE | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("After sending my query: group " + this.gname + " msgid " + m.vid + ":" + m.msgid + ".... Collecting the replies...");
}
AwaitReplies.awaitReplies(ri, this, timeout, m.destList);
if ((VsyncSystem.Debug & VsyncSystem.REPLYWAIT) != 0)
{
Vsync.WriteLine("Collected replies, list contains " + ri.rdvReplies.Count);
}
if ((this.flags & G_SECURE) != 0)
{
this.DecipherReplies(ri);
}
if (this.myLoggingFcn != null)
{
this.myLoggingFcn(IL_QUERY, IL_DONE, Vsync.my_address, mylid);
}
this.VsyncCallDone();
return ri.rdvReplies;
}
internal void DecipherReplies(AwaitReplies.ReplyInfo ri)
{
List<byte[]> tmp = new List<byte[]>();
foreach (byte[] r in ri.rdvReplies)
{
tmp.Add(this.decipherBuf(r));
}
ri.rdvReplies = tmp;
}
internal class querierArgs
{
internal List<byte[]>[] ba;
internal List<byte[]> bap2p;
internal Group g;
internal int nr;
internal bool sentByOracle;
internal int whoAmI;
internal Timeout timeout;
internal object[] obs;
internal querierArgs(List<byte[]>[] b, Group group, int n, bool bo, int w, Timeout to, object[] o)
{
this.ba = b;
this.g = group;
this.nr = n;
this.sentByOracle = bo;
this.whoAmI = w;
this.timeout = to;
this.obs = o;
}
internal querierArgs(List<byte[]> bp2p, Group group, int w, Timeout to, object[] o)
{
this.ba = null;
this.bap2p = bp2p;
this.g = group;
this.whoAmI = w;
this.timeout = to;
this.obs = o;
this.sentByOracle = false;
}
}
internal static List<byte[]>[] doMultiQuery(List<Group> glist, int nreplies, bool sentByOracle, params object[] obs)
{
Timeout timeout;
splitObs(glist, out timeout, ref obs);
int ng = glist.Count;
int gn = 0;
List<byte[]>[] ba = new List<byte[]>[ng];
Thread[] myThreads = new Thread[ng];
// This is kind of gross but easier than the alternative. Probably should consider recoding it
if ((VsyncSystem.Debug & (VsyncSystem.REPLYWAIT | VsyncSystem.VIEWWAIT)) != 0)
{
string s = "Doing a MultiQuery in groups ";
foreach (Group g in glist)
{
s += g.gname + " ";
}
Vsync.WriteLine(s);
}
foreach (Group g in glist)
{
myThreads[gn] = new Thread(myQuerier) { Name = "VSYNC MultiQuery thread for <" + g.gname + ">", IsBackground = true };
querierArgs qa = new querierArgs(ba, g, nreplies, sentByOracle, gn, timeout, obs);
myThreads[gn].Start(qa);
++gn;
}
foreach (Thread t in myThreads)
{
t.Join();
}
return ba;
}
internal static List<byte[]> doMultiP2PQuery(List<Group> glist, params object[] obs)
{
Timeout timeout;
splitObs(glist, out timeout, ref obs);
int ng = glist.Count;
int gn = 0;
List<byte[]> ba = new List<byte[]>();
Thread[] myThreads = new Thread[ng];
// This is kind of gross but easier than the alternative. Probably should consider recoding it
if ((VsyncSystem.Debug & (VsyncSystem.REPLYWAIT | VsyncSystem.VIEWWAIT)) != 0)
{
string s = "Doing a MultiP2PQuery in groups ";
foreach (Group g in glist)
{
s += g.gname + " ";
}
Vsync.WriteLine(s);
}
foreach (Group g in glist)
{
myThreads[gn] = new Thread(myQuerier) { Name = "VSYNC MultiP2PQuery thread for <" + g.gname + ">", IsBackground = true };
querierArgs qa = new querierArgs(ba, g, gn, timeout, obs);
myThreads[gn].Start(qa);
++gn;
}
foreach (Thread t in myThreads)
{
t.Join();
}
return ba;
}
private static void myQuerier(object o)
{
querierArgs qa = (querierArgs)o;
try
{
if ((VsyncSystem.Debug & (VsyncSystem.REPLYWAIT | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("before multiQuery(" + qa.g.gname + "): index[" + qa.whoAmI + "] ... nreplies " + qa.nr);
}
if (qa.ba != null)
{
qa.ba[qa.whoAmI] = qa.g.doQueryToBA(qa.sentByOracle, qa.nr, qa.timeout, qa.obs);
}
else
{
qa.bap2p[qa.whoAmI] = qa.g.doP2PQuery(qa.g.theView.members[0], qa.timeout, qa.obs);
}
if ((VsyncSystem.Debug & (VsyncSystem.REPLYWAIT | VsyncSystem.VIEWWAIT)) != 0)
{
Vsync.WriteLine("after multiQuery(" + qa.g.gname + "): index[" + qa.whoAmI + "] reply len " + ((qa.ba == null) ? qa.bap2p[qa.whoAmI].Length : qa.ba[qa.whoAmI].Count));
}
}
catch (VsyncShutdownException)
{
qa.ba[qa.whoAmI] = new List<byte[]>();
}
VsyncSystem.ThreadTerminationMagic();
}
internal static void FixUp(Address Old, Address New)
{
if(Old == null || New == null)
{
return;
}
var gClone = VsyncGroupsClone();
foreach(var g in gClone)
{
using (var tmpLockObj = new LockAndElevate(g.ToDoLock))
{
foreach (var m in g.ToDo)
{
if (Old.Equals(m.sender))
{
m.sender = New;
}
}
}
}
}
internal void ReplayToDo()
{
if (this.ToDoCount == 0)
{
return;
}
using (Semaphore ReplayWait = new Semaphore(0, int.MaxValue))
{
using (var tmpLockObj = new LockAndElevate(this.ToDoLock))
{
new Thread(() =>
{
try
{
List<Msg> oldToDo;
int vid = 0;
using (var tmpLockObj1 = new LockAndElevate(this.ToDoLock))
{
oldToDo = this.ToDo;
this.ToDo = new List<Msg>();
this.ToDoCount = 0;
}
if (oldToDo.Count > 0)
{
foreach (Msg m in oldToDo)
{
if(Vsync.MY_OLD_MASTER != null && Vsync.MY_OLD_MASTER.Equals(m.sender))
{
m.sender = Vsync.MY_MASTER;
}
using (var tmpLockObj1 = new LockAndElevate(this.ViewLock))
{
if (this.theView != null)
{
vid = this.theView.viewid;
}
}
if (m.vid > vid || !this.GotAMsg(m, Msg.MULTICAST, "replayToDo"))
{
using (var tmpLockObj1 = new LockAndElevate(this.ToDoLock))
{
if (m.toDoTime == 0)
{
m.toDoTime = Vsync.NOW;
}
else if ((Vsync.NOW - m.toDoTime) > Vsync.VSYNC_DEFAULTTIMEOUT * 6 && Vsync.VSYNC_SHUTDOWNIFOVERLOADED)
{
throw new VsyncException("Vsync is shutting down due to extremely long scheduling delays. Is your computer unusually overloaded?");
}
this.ToDo.Add(m);
this.ToDoCount++;
}
}
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
VsyncSystem.ThreadTerminationMagic();
ReplayWait.Release();
}) { Name = "ToDo Replay Thread", IsBackground = true }.Start();
}
ILock.NoteThreadState("ReplayWait.WaitOne()");
ReplayWait.WaitOne();
ILock.NoteThreadState(null);
}
}
internal void CheckCausalWaitQueue()
{
using (var tmpLockObj = new LockAndElevate(this.CausalOrderListLock))
{
ctuple ct = this.CausalOrderList.FirstOrDefault();
if (ct != null && (Vsync.NOW - ct.whenEnqueued) > Vsync.VSYNC_DEFAULTTIMEOUT * 6)
{
throw new VsyncException("Vsync causal send: message(s) trapped on the causal delivery queue for too long");
}
}
}
internal int StabilityCbPending = -1;
internal void isStable(Address who, int n)
{
if (who == null || n < 0)
{
return;
}
using (var tmpLockObj = new LockAndElevate(this.ViewLock))
{
if (this.theView != null)
{
int rank = this.theView.isLarge ? -1 : this.theView.GetRankOf(who);
if (rank != -1)
{
this.theView.StableTo[rank + 1] = Math.Max(this.theView.StableTo[rank + 1], n);
}
}
}
using (var tmpLockObj = new LockAndElevate(this.UnstableLock))
{
List<Msg> tmpUnstable = new List<Msg>();
foreach (Msg m in this.Unstable)
{
if (m.msgid > n || m.sender != who)
{
tmpUnstable.Add(m);
}
}
this.Unstable = tmpUnstable;
this.UnstableCount = this.Unstable.Count;
}
}
internal static LockObject slock = new LockObject("SendStabilityLock");
internal static bool sending = false;
internal static void SendStability()
{
List<Group> wantsStabilitySent = new List<Group>();
List<Group> igc = Group.VsyncGroupsClone();
// Make sure we don't launch too many of these threads at a time
using (var tmpLockObj = new LockAndElevate(slock))
{
if (sending)
{
return;
}
sending = true;
}
new Thread(() =>
{
// Do the action as a thread because we're called from the resender thread
try
{
foreach (Group g in igc)
{
View theView;
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
theView = g.theView;
}
if (theView == null)
{
continue;
}
using (var tmpLockObj = new LockAndElevate(g.GroupFlagsLock))
{
if (((g.flags & Group.G_SENDINGSTABILITY) == 0 && ((theView.minStable < theView.lastStabilitySent || g.CurrentBacklog != g.PreviousBacklog) && (Vsync.NOW - g.SentStableAt) > 100)) || ((FlowControl.Waiting > 0 || ReliableSender.rWaiting > 0) && (Vsync.NOW - g.SentStableAt) > 1000))
{
wantsStabilitySent.Add(g);
}
}
}
foreach (Group g in wantsStabilitySent)
{
if (ReliableSender.doSendStability(g))
{
using (var tmpLockObj = new LockAndElevate(g.GroupFlagsLock))
{
g.CurrentBacklog = 0;
}
}
}
}
catch (VsyncShutdownException)
{
VsyncSystem.CheckLocksHeld();
}
using (var tmpLockObj = new LockAndElevate(slock))
{
sending = false;
}
VsyncSystem.ThreadTerminationMagic();
}) { Name = "Sending stability", Priority = ThreadPriority.Highest, IsBackground = true }.Start();
}
/// <exclude>
/// <summary>
/// Internal for use by Vsync; public only to satisfy C# scope rules.
/// </summary>
/// </exclude>
#if PROTOCOL_BUFFERS
[ProtoContract(SkipConstructor = true)]
#else
[AutoMarshalled]
#endif
public class FlushAggKey : IEquatable<FlushAggKey>
{
/// <exclude>
/// <summary>
/// Internal for use by Vsync
/// </summary>
/// </exclude>
[ProtoMember(1)]
public readonly Address who;
/// <exclude>
/// <summary>
/// Internal for use by Vsync
/// </summary>
/// </exclude>
[ProtoMember(2)]
public readonly int state;
#if !PROTOCOL_BUFFERS
/// <exclude></exclude>
public FlushAggKey()
{
}
#endif
internal FlushAggKey(Address a, int s)
{
this.who = a;
this.state = s;
}
/// <exclude>
/// <summary>
/// Internal for use by Vsync
/// </summary>
/// <returns>string encoding the state of the flush aggregator</returns>
/// </exclude>
public override string ToString()
{
return this.who + ((this.state == tokenInfo.SETSTABLETO) ? "|1" : "|0");
}
/// <exclude></exclude>
public static bool operator ==(FlushAggKey first, FlushAggKey second)
{
return Equals(first, second);
}
/// <exclude></exclude>
public static bool operator !=(FlushAggKey first, FlushAggKey second)
{
return !Equals(first, second);
}
/// <exclude>
/// <summary>
/// Equality comparison
/// </summary>
/// <param name="first">First comparison target</param>
/// <param name="second">Second comparison target</param>
/// </exclude>
public static bool Equals(FlushAggKey first, FlushAggKey second)
{
if (object.ReferenceEquals(first, second))
{
return true;
}
if (object.ReferenceEquals(first, null) || object.ReferenceEquals(second, null))
{
return false;
}
return first.who == second.who && first.state == second.state;
}
/// <exclude>
/// <summary>
/// Equality comparison
/// </summary>
/// <param name="other">comparison target</param>
/// </exclude>
public override bool Equals(object other)
{
return Equals(this, other as FlushAggKey);
}
/// <exclude>
/// <summary>
/// Equality comparison
/// </summary>
/// <param name="other">comparison target</param>
/// </exclude>
public bool Equals(FlushAggKey other)
{
return Equals(this, other);
}
/// <exclude>
/// <summary>
/// Required hashcode method
/// </summary>
/// <returns>the hash code</returns>
/// </exclude>
public override int GetHashCode()
{
return this.who.GetHashCode() + (this.state * 1717);
}
}
/// <exclude>
/// <summary>
/// Internal, represents the tokens employed in the Vsync token-tree algorithm. Declared public to comply with C# scoping rules.
/// </summary>
/// </exclude>
public class tokenInfo : ISelfMarshalled
{
// Wire portion only includes these fields
internal Address gaddr; // The group in which this token is circulating
internal Address groupOwner; // The sender for that group, used because the members don't track
internal Address sender; // Most recent sender, useful for debugging
internal Vsync.ViewDelta[] viewDeltas;
// Recent View Deltas, sent as an vector when some kind of membership event occurs
internal volatile int logicalClock;
internal int state; // State currently has just 3 values
internal int stableAtSender; // Stable up to this msgid.
internal int viewid; // Viewid for this token
internal int alsoSeenBase; // alsoSeen is computed wrt to this base
internal long alsoSeen; // Up to 64 bits for ids > stable that have been seen membership in large groups
internal int tokenLevel; // Level of the token tree at which this token was sent
internal int aggStable; // Aggregated stable value for nodes within the specified level, for transmission
internal int stableTo; // Propagates out from the group leader and triggers garbage collection
internal byte[][] incomingValuesArray; // Extracted on incoming token, undefined for outgoing token
// State values
internal const int NORMAL = 0; // Large group in the normal operational mode
internal const int INQUIRY = 1; // New leader inquiring about member states
internal const int SETSTABLETO = 2;
// New leader is promulgating the initial "stableto" value for its first view
// Not transmitted (inferred on token arrival)
internal const int RINGSIZE = 8;
// 25 worked well in Quicksilver Scalable Multicast. Must be >= 2! Guess: optimal is log(N)
// Per-group values used within the token algorithms
internal long whenReceived; // Time when this token was received
internal volatile bool inhibitResenderLoop;
// While stabilizing after a large-group membership change, inhibits resender loop temporarily
internal long gotAllAt; // ... associated delay timer
internal int unstableVIDMID; // Multicast ID number that was used to cast a view id that isn't stable yet
internal int unstableVID; // Associated viewID
internal int stableVID; // Max view id to have become stable so far
internal LockObject slock = new LockObject("token.slock"); // Protects the VID fields
internal Group theGroup;
internal View WorkingView;
internal long WorkingViewInstalledAt;
internal bool IAmLgOwner = false;
internal bool[] IAmRank0;
internal ILock FlushingBarrier; // Used to wait while flushing is running
internal LockObject FlushingBarrierLock = new LockObject("token.FlushingBarrierLock");
// Better safe than sorry!
internal int nlevels;
internal int[] mySubgroupIdx;
internal int[] myOffset;
internal int[] StableByLevel;
// Used in the rank-0 members of each ring, StableByLevel[i] is the value of aggStable received from the last guy in ring [i]
internal int[] includeViewDeltas; // Tells me if the view delta vector for this level needs to be included
internal tokenInfo[] lastToken; // Last token I received
internal Address[] next;
internal Address[] last;
internal bool[] lastValidated; // I've received at least one token from last since prior reset
internal bool[] sentAToken; // True if I sent a token to try and "push" last[i] into sync with me
internal bool[] pinged; // True if I didn't get a token from you so I pinged you
internal int[] tokenInMotion; // Counts the number of tokens to next[level] that are "in motion"
internal LockObject tokenInMotionLock = new LockObject("tokenInMotionLock");
internal long resetTime;
internal tokenInfo(Group g)
{
if (g.theToken == null || g.theToken.viewDeltas == null)
{
this.viewDeltas = new Vsync.ViewDelta[0];
this.WorkingView = g.theView;
}
else
{
this.viewDeltas = g.theToken.viewDeltas;
this.WorkingView = g.theToken.WorkingView;
}
this.WorkingViewInstalledAt = Vsync.NOW;
this.theGroup = g;
this.reinitializeToken(g);
this.IAmLgOwner = g.theView.GetMyRank() == 0;
}
// Caller must hold g.Lock and g.tokenLock
internal static void newToken(Group g)
{
tokenInfo oldToken = g.theToken;
g.theToken = new tokenInfo(g);
if (oldToken != null)
{
g.theToken.stableTo = oldToken.stableTo;
g.theToken.logicalClock = oldToken.logicalClock + 1;
}
if (oldToken == null || oldToken.theGroup.AggList == null)
{
return;
}
g.theToken.logicalClock = oldToken.logicalClock;
if (oldToken.IAmLgOwner)
{
foreach (LinkedList<object> item in oldToken.theGroup.AggList)
{
foreach (IAggregateEventHandler ae in item)
{
ae.AggEvent(Group.BreakWaits);
}
}
}
else if ((VsyncSystem.Debug & VsyncSystem.TOKENFLUSH) != 0)
{
Vsync.WriteLine("<" + g.gname + ">: Unhibit breakwaits (I wasn't the previous LgOwner");
}
}
internal void resetStableByLevel(Group g)
{
if ((VsyncSystem.Debug & VsyncSystem.TOKENSTABILITY) != 0)
{
Vsync.WriteLine("<" + g.gname + ">: LogicalClock=" + this.logicalClock + ", reset StableByLevel[*]=-1");
}
for (int level = 0; level < this.StableByLevel.Length; level++)
{
this.StableByLevel[level] = -1;
}
}
// Caller holds g.tokenLock
internal void reinitializeToken(Group g)
{
Address[] Mlist = this.WorkingView.members;
int NMemb = Mlist.Length;
int myrank = this.WorkingView.GetMyRank();
if (NMemb == 0 || myrank == -1)
{
return;
}
this.nlevels = 0;
for (int n = NMemb; n > 0; n /= RINGSIZE)
{
this.nlevels++;
}
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine(" REINITIALIZE TOKEN FOR GROUP <" + this.theGroup.gname + ">, view " + this.WorkingView.viewid + " NMemb=" + NMemb + " nlevels=" + this.nlevels);
}
this.resetTime = Vsync.NOW;
this.mySubgroupIdx = new int[this.nlevels];
this.myOffset = new int[this.nlevels];
this.next = new Address[this.nlevels];
this.last = new Address[this.nlevels];
this.lastValidated = new bool[this.nlevels];
this.sentAToken = new bool[this.nlevels];
this.pinged = new bool[this.nlevels];
ReliableSender.CleanLgCallbacks(g);
using (var tmpLockObj = new LockAndElevate(this.tokenInMotionLock))
{
this.tokenInMotion = new int[this.nlevels];
}
this.StableByLevel = new int[this.nlevels];
this.lastToken = new tokenInfo[this.nlevels];
this.includeViewDeltas = new int[this.nlevels];
this.IAmRank0 = new bool[this.nlevels];
if ((VsyncSystem.Debug & VsyncSystem.AGGREGATION) != 0)
{
Vsync.WriteLine("Reinitialize the AggList[nlevels=" + this.nlevels + "] from the AggTypes List in <" + g.gname + ">... (it lists " + g.AggTypes.Count + " types)");
}
this.theGroup.AggList = new LinkedList<object>[this.nlevels];
this.viewid = this.WorkingView.viewid;
using (var tmpLockObj = new LockAndElevate(this.theGroup.AggListLock))
{
binfo.resetBarrierList();
for (int n = 0; n < this.nlevels; n++)
{
this.theGroup.AggList[n] = new LinkedList<object>();
foreach (AggInfo ag in g.AggTypes)
{
if ((VsyncSystem.Debug & VsyncSystem.AGGREGATION) != 0)
{
Vsync.WriteLine("Calling constructor in <" + g.gname + "> to allocate a new aggregator of type " + ag.KVT);
}
// These are actually "aggregator" objects of some derived type
this.theGroup.AggList[n].AddLast(ag.myFactory.Invoke(new[] { ag.theGroup, this.viewid, n, ag.theDel, ag.theTimeout }));
}
}
}
if (this.viewDeltas.Length != 0)
{
for (int n = 0; n < this.nlevels; n++)
{
this.includeViewDeltas[n] = 4;
}
}
this.groupOwner = Mlist[0];
this.stableAtSender = this.aggStable = this.stableTo = -1;
this.gaddr = this.theGroup.gaddr;
this.sender = Vsync.my_address;
bool inNextLevel = true;
int stride = 1;
int ringsize = RINGSIZE;
for (int i = 0; i < this.nlevels; i++)
{
if (inNextLevel)
{
int residue = -1;
for (int delta = 0; delta < 15; delta++)
{
int theResidue = NMemb % (ringsize - delta);
if (theResidue == 0)
{
break;
}
if (theResidue > residue)
{
residue = NMemb % (ringsize - delta);
}
}
this.mySubgroupIdx[i] = myrank / ringsize;
this.myOffset[i] = myrank % ringsize;
this.IAmRank0[i] = this.myOffset[i] == 0;
int sgBase = myrank - this.myOffset[i];
int idx = sgBase + ((this.myOffset[i] + stride) % ringsize);
if (idx >= NMemb)
{
idx = sgBase;
}
this.next[i] = Mlist[idx];
int lidx = myrank - stride;
if (lidx < sgBase)
{
lidx += ringsize;
while (lidx >= NMemb)
{
lidx -= stride;
}
}
this.last[i] = Mlist[lidx];
this.lastValidated[i] = false;
this.sentAToken[i] = false;
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("Token layer[" + i + "] initializing next/last: MemberList = " + Address.VectorToString(Mlist) + ", myRank=" + myrank + ", ringsize=" + ringsize + ", sgBase=" + sgBase + ", last[" + i + "]=" + this.last[i] + ", next[" + i + "]=" + this.next[i]);
}
inNextLevel = myrank % ringsize == 0;
stride = ringsize;
ringsize *= RINGSIZE;
}
else
{
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("Token layer[" + i + "] not in at this level, setting nlevels=" + i);
}
this.nlevels = i;
}
this.alsoSeen = 0;
this.StableByLevel[i] = -1;
}
if (this.groupOwner.isMyAddress() && (this.next[this.nlevels - 1] == null || this.next[this.nlevels - 1].isMyAddress()))
{
this.nlevels = Math.Max(1, this.nlevels - 1);
}
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("... reinitalized token = " + this);
}
if ((VsyncSystem.Debug & VsyncSystem.TOKENSTABILITY) != 0)
{
Vsync.WriteLine("<" + g.gname + ">: New view = " + Address.VectorToString(Mlist) + Environment.NewLine);
for (int level = 0; level < this.next.Length; level++)
{
Vsync.WriteLine("<" + g.gname + ">: -- last[" + level + "]=" + (this.last[level] == null ? "(null)" : (this.last[level].isMyAddress() ? "(self)" : this.last[level].ToString())));
Vsync.WriteLine("<" + g.gname + ">: -- next[" + level + "]=" + (this.next[level] == null ? "(null)" : (this.next[level].isMyAddress() ? "(self)" : this.next[level].ToString())));
}
}
}
// Used to generate a working view by applying view deltas to the group view
// Parent group is the group within which the action is occuring, but this call can
// occur far into the future (it mostly updates the WorkingView structure) and we
// don't do callbacks at this time. So we just create a temporary, fake, Group object
// Returns the first group view in which I am the leader (or -1, if none); used in flush
internal void applyViewDeltas(Group parentGroup, Vsync.ViewDelta[] newvds)
{
using (var tmpLockObj = new LockAndElevate(parentGroup.TokenLock))
using (var tmpLockObj1 = new LockAndElevate(parentGroup.ViewLock))
{
int priorView = -1;
if (this.WorkingView != null)
{
priorView = this.WorkingView.viewid;
}
this.updateViewDeltas(newvds);
using (Group g = new Group())
{
g.AggTypes = parentGroup.AggTypes;
g.gaddr = parentGroup.gaddr;
g.gname = parentGroup.gname;
using (var tmpLockObj2 = new LockAndElevate(g.ViewLock))
{
g.theView = this.WorkingView;
}
foreach (Vsync.ViewDelta vd in this.viewDeltas)
{
if (vd.prevVid == g.theView.viewid)
{
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine(" applying view delta = " + vd);
}
Vsync.UpdateGroupView(false, vd, g, "update theToken.WorkingView");
if (parentGroup.myFirstLeadershipView == 0 && g.IAmRank0())
{
parentGroup.myFirstLeadershipView = g.theView.viewid;
}
}
}
this.WorkingView = g.theView;
this.WorkingViewInstalledAt = Vsync.NOW;
if (parentGroup.gaddr != null && parentGroup.gaddr == Vsync.VSYNCMEMBERS.gaddr)
{
IPMCNewView(parentGroup.gaddr, this.WorkingView);
}
}
if (priorView == this.WorkingView.viewid)
{
return;
}
// Finally, recompute the token using the new WorkingView
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("reinitializing the token from the working view = " + this.WorkingView);
}
newToken(parentGroup);
}
}
internal void updateViewDeltas(Vsync.ViewDelta[] newvds)
{
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("TOKEN: updateViewDeltas called on a VD vector of length " + newvds.Length + ", with prior VDS = ");
foreach (Vsync.ViewDelta vd in this.viewDeltas)
{
Vsync.WriteLine(" " + vd);
}
Vsync.WriteLine("NEW VIEW DELTAS TO MERGE IN:");
foreach (Vsync.ViewDelta vd in newvds)
{
Vsync.WriteLine(" " + vd);
}
}
// First create one merged list of view deltas
List<Vsync.ViewDelta> newvdlist = new List<Vsync.ViewDelta>();
foreach (Vsync.ViewDelta vd in newvds)
{
bool fnd = false;
if (vd.prevVid < this.stableVID)
{
fnd = true;
}
else
{
foreach (Vsync.ViewDelta knownvd in this.viewDeltas)
{
if (vd.prevVid <= knownvd.prevVid || (knownvd.leaderId == vd.leaderId && knownvd.gaddr == vd.gaddr && knownvd.prevVid == vd.prevVid))
{
fnd = true;
break;
}
}
}
if (!fnd)
{
foreach (Vsync.ViewDelta knownvd in newvdlist)
{
if (knownvd.leaderId == vd.leaderId && knownvd.gaddr == vd.gaddr && knownvd.prevVid == vd.prevVid)
{
fnd = true;
break;
}
}
}
if (!fnd)
{
newvdlist.Add(vd);
}
}
if (newvdlist.Count == 0)
{
return;
}
using (var tmpLockObj = new LockAndElevate(this.slock))
{
Vsync.ViewDelta[] mergedVds = new Vsync.ViewDelta[this.viewDeltas.Length + newvdlist.Count];
int idx = 0;
int maxvid = -1;
foreach (Vsync.ViewDelta vd in this.viewDeltas)
{
mergedVds[idx++] = vd;
maxvid = vd.prevVid;
}
foreach (Vsync.ViewDelta vd in newvdlist)
{
if (vd.prevVid >= this.stableVID && vd.prevVid > maxvid)
{
mergedVds[idx++] = vd;
}
}
this.viewDeltas = mergedVds;
for (int level = 0; level < this.nlevels; level++)
{
// Deltas will be included on the next few tokens sent
this.includeViewDeltas[level] = 4;
}
this.fixVDS();
// Now apply the vds to create the working view using a fake group
if ((VsyncSystem.Debug & VsyncSystem.TOKENLOGIC) != 0)
{
Vsync.WriteLine("TOKEN: updateViewDeltas recomputed VD vector..." + this);
}
}
}
internal void fixVDS()
{
int idx = 0;
for (int i = 0; i < this.viewDeltas.Length; i++)
{
if (this.viewDeltas[i] != null && this.viewDeltas[i].prevVid >= this.stableVID)
{
this.viewDeltas[idx++] = this.viewDeltas[i];
}
}
if (idx != this.viewDeltas.Length)
{
Vsync.ArrayResize(ref this.viewDeltas, idx);
}
}
// Checks for holes in the ViewDelta list of the current token; used as a unit-check
internal void checkVDS(string fromWhere)
{
Group g = doLookup(this.gaddr);
if (g == null)
{
Vsync.WriteLine("Warning: CheckVDS(" + fromWhere + ") -- group lookup returned null");
return;
}
if ((g.flags & G_ISLARGE) == 0)
{
return;
}
tokenInfo theToken;
using (var tmpLockObj = new LockAndElevate(g.TokenLock))
{
theToken = g.theToken;
}
View theView;
using (var tmpLockObj = new LockAndElevate(g.ViewLock))
{
theView = g.theView;
}
if (theToken == null || theView == null)
{
Vsync.WriteLine("Warning: CheckVDS(" + fromWhere + ") -- theToken null or theView null");
return;
}
if (theToken != this)
{
Vsync.WriteLine("Warning: CheckVDS(" + fromWhere + ") -- theToken != g.theToken");
return;
}
if (theToken.viewDeltas == null || theToken.viewDeltas.Length == 0)
{
return;
}
Vsync.ViewDelta vd = theToken.viewDeltas[0];
if (vd.prevVid > theView.viewid)
{
throw new VsyncException("in checkVDS(" + fromWhere + ") vd[0].previd=" + vd.prevVid + " but theView=" + theView);
}
View wv = theToken.WorkingView;
if (vd.prevVid > wv.viewid)
{
throw new VsyncException("in checkVDS(" + fromWhere + ") vd[0].previd=" + vd.prevVid + " but WorkingView=" + wv);
}
}
/// <summary>
/// Constructor for tokens received via Vsync Msg layer
/// </summary>
/// <param name="ba">byte vector encoding a token</param>
public tokenInfo(byte[] ba)
{
int idx = 0;
object[] obs = Msg.BArrayToObjects(ba, typeof(Address), typeof(Address), typeof(Address), typeof(int), typeof(int), typeof(int), typeof(int), typeof(int), typeof(long), typeof(int), typeof(int), typeof(int), typeof(int), typeof(Vsync.ViewDelta[]), typeof(byte[][]));
this.gaddr = (Address)obs[idx++];
this.theGroup = Group.doLookup(this.gaddr);
this.groupOwner = (Address)obs[idx++];
this.sender = (Address)obs[idx++];
this.logicalClock = (int)obs[idx++];
this.state = (int)obs[idx++];
this.viewid = (int)obs[idx++];
this.stableAtSender = (int)obs[idx++];
this.alsoSeenBase = (int)obs[idx++];
this.alsoSeen = (long)obs[idx++];
this.aggStable = (int)obs[idx++];
this.tokenLevel = (int)obs[idx++];
this.stableTo = (int)obs[idx++];
this.stableVID = (int)obs[idx++];
this.viewDeltas = (Vsync.ViewDelta[])obs[idx++];
this.incomingValuesArray = (byte[][])obs[idx];
if (this.theGroup != null && this.theGroup.myAes != null)
{
using (var tmpLockObj = new LockAndElevate(this.theGroup.myAesLock))
{
if ((VsyncSystem.Debug & VsyncSystem.CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment