markmandel/design.md

## design.md

      
    Raw
  

              design.md
            
          
    Problem


To build cluster autoscalers, we need a way to declare the state how many nodes there should be in a cluster -- that is GameServer aware, and does not shut down nodes that have GameServers on them that are Allocated.
Current auto scalers will not work, as they are built for stateless solutions - and thus will terminate nodes arbitrarily
This solution should be designed, such that it can be used on multiple clouds - as well as can be independently extended for those that run their own Kubernetes clusters on their own infrastructure.

Design

To control the number of nodes, we can use a CRD, and controller much as we would anything else.
If the CRD is not created in the cluster, then the controller will do nothing.
There could be multiple GameServerNode instances - for example, when you have multiple nodepools in a GKE cluster, you could have one for each node pool.
This would be an example for manipulating nodes on GKE, but if there are issues you see with other infrastructure, please raise them
apiVersion: "stable.agones.dev/v1alpha1"
kind: GameServerNode
metadata:
spec:
  # the target number of nodes, to be reached safely.
  replicas: 3
  # defaults to auto detection, but could be: GKE, AKS, etc, or custom.
  # For initial implementation, we will do minikube (does nothing) and GKE
  provider: auto
  # this is a map[string]string, that has provider specific configuration. Validated through `Validate`
  config:
      nodepool: gameserver
  # if custom provider is set, then configure the service details here.
  grpcHook:
    # base64 https cert public key
    caBundle: "87zyhshjds---"
    # optional external cluster address
    address: domain:port
    # in cluster service reference (instead of address above)
    service:
      name: gameServerNode
      namespace: agones-system

Question: Not sold on the Kind "GameServerNode" -- anyone got anything better?
Question: I'm assuming we'll use the same cert that we use for webhooks, but the CRD creation will happen outside of the helm install process (at least in this design), so we should documentation on how to get the public key from the secret - I assume that this will work? Should test.

Provider Implementation

The cloud/custom implementation will be implemented as a gRPC service that could be hosted in cluster, or externally. Default implementations for cloud providers will be hosted internally and come bundled with Agones, This will be in a separate binary and deployment from the controller, but in the same namespace.
The proto definition for a K8s provider service:
syntax = "proto3";

service GameServerNode {
    // validates the configuration for a given provider
    rpc Validate(Config) returns (Valid) {}
    // increase the number of nodes. Should only ever increase!
    rpc Increase (Increase) returns (Empty) {}
    // delete one of more specific nodes
    rpc Delete (Nodes) returns (Empty) {}
}

// list of nodes to delete, with the config definitions
message Nodes {
    // list of the K8s node names
    repeated string names = 1;
    Config config = 2;
}

// increase target, sent with config definitions
message Increase {
    int64 increase_size = 1;
    Config config = 2;
}

// the annotations/config definitions
message Config {
    // which provider to use
    string provider = 1;
    map<string, string> values = 2;
}

message Valid {
    // is this config valid ?
    bool is_valid = 1;
    // if not, provide details
    repeated Detail details = 2;

    message Detail {
        // Type matches https://godoc.org/k8s.io/apimachinery/pkg/util/validation/field#ErrorType
        string type = 1;
        //  validation error message
        string message = 2;
        // the field in quesitons
        string field = 3;
    }
}

}
// I am Empty
message Empty {}
Node scaling strategy

Scaling up


Diff the current number of nodes with the target.
If there are nodes that are marked as "Unschedulable", then flip the diff'd number (or as many as are available) of them back to being "Schedulable"

Sorted the nodes by largest amount of game pods per node, so that we open up the most active of nodes, and can scale down more easily the emptier nodes.


Add the number of nodes to reach the remaining difference to the target nodes, through the grpc service Increase

Scaling Down


Diff the current number of nodes with the target.
Identify the diff'd number of nodes with the least number of Allocated game servers on them and mark them as unschedable.
Delete non Allocated GameServers on the Scheduled nodes, so they can't move to Allocated and keep the node alive.
Once the node has zero game server pods on it, delete the nodes from K8s and use the grpc service Delete to implement the VM deletion of those nodes.

Research


https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#webhookclientconfig-v1beta1-admissionregistration-k8s-io
https://www.compoundtheory.com/scaling-dedicated-game-servers-with-kubernetes-part-4-scaling-down/

history