Skip to content

Instantly share code, notes, and snippets.

@rudle
Last active February 17, 2017 00:39
Show Gist options
  • Save rudle/bdaa5638edf59c3b31094514aaf6adf3 to your computer and use it in GitHub Desktop.
Save rudle/bdaa5638edf59c3b31094514aaf6adf3 to your computer and use it in GitHub Desktop.

Failsafe Audit & Doc

Find below a description of what behaviors P2 will take for each of its entities and the 3 types of change seen via a Watch.

intent

Intent entities contain pod manifests and describe the desired state of the system. Intent entries can be mutated by a human (with sufficient ACL), a daemon set, or more often a Rolling Update (commonly called a deploy).

The preparer watches intent on a per host basis, responding to changes for the host that it manages (ie. /intent/awa101.sjc1.square/*)

Delete

  1. Deletion of intent is the way pod uninstallation is implemented. Historically, the p2-preparer has implemented a failsafe to not take any action if it does not see itself in /intent. This has the effect of preventing mass installation if the host's intent subtree is missing.

Create

  1. This signals installation of a pod. The preparer responds to this sort of change by downloading, unpacking and installing the application described in the pod manifest.

Mutate

  1. This signals a version change of a pod.

reality

Reality entities are how the preparer tracks its work. Upon completing the installation of a pod, the preparer writes the manifest to reality. Whenever /intent and /reality are different, the preparer takes an action. There are no watches for the /reality tree.

daemon sets

A daemon set manages pods at scale by mutating /intent for hosts that match its node selector. They differ from Replication Controllers in that they are not bounded by a replicacount and in the way that versions are managed.

Delete

  1. Until very recently, daemon set removal would cause all of the managed pods to be removed from /intent. This had the effect of uninstalling apps.

  2. Today, daemon set deletion is a nop [square/p2#754]. /intent records will remain until they are removed by human intervention.

Create

  1. Daemon set replication will start writing /intent records for hosts that match its nodeselector according to a rate limit

Mutate

  1. Daemon set replication will start writing /intent records for hosts that match its nodeselector according to a rate limit

pod clusters

A Pod Cluster is a representation of the semi-persistent configuration required to deploy a pod in a datacenter. Today, this is used primarily to automate an app's VIPs in the F5 Load Balancers.

Delete

  1. This will remove any VIPs described by the Pod Cluster. A failsafe has been added [square/p2#755] so that we don't remove all VIPs in case of an empty (nonsensical) response from the watch.

Create

  1. This creates VIPs and writes a key to /status to mark that the work has been completed (this minimizes load on the F5).

Mutate

  1. In certain cases, this modifies the VIPs described by the Pod Cluster

rcstore

A replication controller is responsible for managing the replica set of pods across a cluster of hosts. It consists of a node selector and a desired number of replicas.

Delete

  1. The code managing the /intent records for the hosts matching the node selector halts gracefully. /intent records are preserved. There is no failsafe for all /replicationcontroller records disappearing at once.

Create

  1. P2 will spin off a goroutine to manage the /intent records for the hosts matching the node selector.

Mutate

  1. Mutation of a nodeselector has no effect today, it will in the future. Mutation of the replicas desired number will cause nodes to be scheduled or unscheduled appropriately.

rollstore

A "roll" can be thought of as two Replication Controllers plus some metadata. Changes to this tree are what cause Appdash deploys to happen.

Delete

  1. The code managing the state transition between the two Replication Controllers halts. Replication Controllers are left in place. There is no failsafe for all /rolls disappearing simultaneously

Create

  1. A deploy commences!

statusstore

Delete

  1. A record being removed from /status may cause something to be retried

Create

  1. No side-effects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment