juliandunn/awsadvent-2014-ha-in-aws.md Secret

## awsadvent-2014-ha-in-aws.md

      
    Raw
  

              awsadvent-2014-ha-in-aws.md
            
          
    High-Availability in AWS with keepalived, EBS and Elastic Network Interfaces

by Julian Dunn, Engineering Lead, Chef Software
Introduction

By now, most everyone knows that running infrastructure in AWS is not the same as a traditional data center, thus putting a lie to claims that you can just "lift and shift to the cloud". In AWS, one normally achieves "high-availability" by scaling horizontally. For example, if you have a WordPress site, you could create several identical WordPress servers and put them all behind an Elastic Load Balancer (ELB), and connect them all to the same database. That way, if one of these servers fails, the ELB will stop directing traffic to it, but your site will still be available.
But about that database -- isn't it also a single-point-of-failure? You can't very well pull the same horizontal-redundancy trick for services that explicitly have one writer (and potentially many readers). For a database, you could probably use Amazon Relational Database Server (RDS), but suppose Amazon doesn't have a handy highly-available Platform-as-a-Service variant for the service you need?
In this post, I'll show you how to use that old standby, keepalived, in conjunction with Virtual Private Cloud (VPC) features, to achieve real high-availability in AWS for systems that can't be horizontally replicated.
Kit of Parts

To create high-availability out of two (or more) systems, you need the following components:

A service IP (commonly referred to as a VIP, for virtual IP) that can be moved between the systems to which client systems will communicate
A block device containing data served by the currently-active system that can be detached and reattached to others, should the active one fail
Some kind of cluster coordination system to handle master/backup election, as well as doing all the housekeeping to move the service IP and block device to the active node.

In AWS, we'll use:

Private secondary addresses on an Elastic Network Interface (ENI) as the service IP.
A separate Elastic Block Storage (EBS) volume as the block device
keepalived as the cluster coordination system.

There are a few limitations to this approach in AWS. Most important is that all instances and the block storage device must live in the same VPC subnet, which implies that they live in the same availability zone (AZ).
Just Enough keepalived for HA

Keepalived for Linux has been around for over ten years, and while it is very robust and reliable, it can be very difficult to grasp because it is designed for a variety of use cases, some very distinct from the one we are going to implement. Software design diagrams like this one do not necessarily aid in understanding how it works.
For the purposes of building an HA system, you need only know a few things about keepalived:

As previously mentioned, keepalived serves as a cluster coordination system between two or more peers.
Keepalived uses the Virtual Router Redundancy Protocol (VRRP) for assigning the service IP to the active instance. It does this by talking to the Linux netlink layer directly. Thus, don't try to use ifconfig to examine whether the master's interface has the VIP, as ifconfig doesn't use netlink system calls and the VIP won't show up! Use ip addr instead.
VRRP is normally run over multicast in a closed network segment. However, in a cloud environment where multicast is not permitted, we must use unicast, which implies that we need to list all peers participating in the cluster.
Keepalived has the ability to invoke external scripts whenever a cluster member transitions from backup to master (or vice-versa). We will use this functionality to associate and mount the EBS block device (or the inverse, when transitioning from master to backup).

Building the HA System

We'll spin up two identical systems in the same VPC subnet for our master and backup nodes. To avoid passing AWS access and secret keys to the systems, I've created an IAM instance profile & role called awsadvent-ha with a policy document to let the systems manage ENI addresses and EBS volumes:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeVolumes",
        "ec2:AttachVolume",
        "ec2:DetachVolume",
        "ec2:AssignPrivateIpAddresses"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
For this exercise I used Fedora 21 AMIs, because
Fedora has a recent-enough version of keepalived with VRRP-over-unicast support:
$ aws ec2 run-instances --image-id ami-164cd77e --key-name us-east1-jdunn --security-groups internal-icmp,ssh-only --instance-type t1.micro --subnet-id subnet-c0ffee11 --iam-instance-profile awsadvent-ha --count 2
You'll notice that one of the security groups I've placed the machines into is entitled
internal-icmp, which is a group I created to allow the instances to ping each other
(send ICMP Echo Request and receive ICMP Echo Reply). This is what keepalived will use as a
heartbeat mechanism between nodes.
We also need a separate EBS volume for the data, so let's create one in the same AZ as the instances:
$ aws ec2 create-volume --size 10 --availability-zone us-east-1a --volume-type gp2
Note that the volume needs to be partitioned and formatted at some point; I don't do that in
this tutorial.
Installing and configuring keepalived

Once the two machines are up and reachable, it's time to install and configure keepalived. SSH to them and type:
$ sudo yum -y install keepalived

I intend to write the external failover scripts called by keepalived in Ruby, so I'm going to install that, and the fog gem that will let me communicate with the AWS API:
$ sudo yum -y install ruby rubygem-fog
keepalived is configured using the /etc/keepalived/keepalived.conf file. Here's the configuration I used for this demo:
global_defs {
   notification_email {
     jdunn@chef.io
   }
   notification_email_from keepalived@chef.io
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
}

vrrp_sync_group VG_1 {
  group {
    VI_1
  }
}

vrrp_instance VI_1 {
    state MASTER
    ! nopreempt: allow lower priority machine to maintain master role
    nopreempt
    interface eth0
    virtual_router_id 1
    priority 100
    notify_backup "/etc/keepalived/awsha.rb backup"
    notify_master "/etc/keepalived/awsha.rb master"
    notify_fault  "/etc/keepalived/awsha.rb fault"
    notify_stop "/etc/keepalived/awsha.rb backup"
    unicast_srcip 172.31.40.96
    unicast_peer {
        172.31.40.95
    }
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass generate-a-real-password-here
    }
    virtual_ipaddress {
        172.31.36.57 dev eth0
    }
}

A couple of notes about this configuration:

172.31.40.96 is the current machine; 172.31.40.95 is its peer. The peer has the IPs reversed in the unicast_srcip and unicast_peer clauses, so make sure to change this. (A configuration management system sure would help here...)
172.31.36.57 is the virtual IP address which will be bound as a secondary IP address to the active master's Elastic Network Interface. You can pick anything unused in your subnet.

The notify script, awsha.rb

As previously mentioned, the external script is invoked whenever a master-to-backup or backup-to-master event occurs, via the notify_backup and notify_master directives in keepalived.conf. Upon receiving an event, it will associate and mount (or unmount and disassociate) the EBS volume from the instance, and attach or release the ENI secondary address.
The script is too long to reproduce inline here, so I've included it as a separate Gist.
Note: For brevity, I've eliminated a lot of error-handling from the script, so it may or may
not work out-of-the-box. In a real implementation, you need to check for many error conditions
like open files on a disk volume, poll for the EC2 API to attach/release the volume, etc.
Putting it all together

Start keepalived on both servers:
$ sudo service keepalived start

One of them will elect itself the master, assign the ENI secondary IP to itself, and attach and mount the block device on /mnt. You can see which is which by checking the service status:
ip-172-31-40-96:~$ sudo systemctl status -l keepalived.service
...
Dec 09 21:14:44 ip-172-31-40-96.ec2.internal Keepalived_vrrp[12271]: VRRP_Instance(VI_1) Transition to MASTER STATE
Dec 09 21:14:45 ip-172-31-40-96.ec2.internal Keepalived_vrrp[12271]: VRRP_Instance(VI_1) Entering MASTER STATE
Dec 09 21:14:45 ip-172-31-40-96.ec2.internal Keepalived_vrrp[12271]: VRRP_Instance(VI_1) setting protocol VIPs.
Dec 09 21:14:45 ip-172-31-40-96.ec2.internal Keepalived_vrrp[12271]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 172.31.36.57
Dec 09 21:14:45 ip-172-31-40-96.ec2.internal Keepalived_vrrp[12271]: Opening script file /etc/keepalived/awsha.rb
Dec 09 21:14:45 ip-172-31-40-96.ec2.internal Keepalived_healthcheckers[12270]: Netlink reflector reports IP 172.31.36.57 added
Dec 09 21:14:45 ip-172-31-40-96.ec2.internal Keepalived_vrrp[12271]: VRRP_Group(VG_1) Syncing instances to MASTER state

The other machine will say that it's transitioned to backup state:
Dec 09 21:14:46 ip-172-31-40-95.ec2.internal Keepalived_vrrp[1971]: VRRP_Instance(VI_1) Entering BACKUP STATE

To force a failover, stop keepalived on the current master. The backup system will detect that
the master went away, and transition to primary:
ip-172-31-40-95:~$ sudo systemctl status -l keepalived.service
...
Dec 09 21:25:05 ip-172-31-40-95.ec2.internal Keepalived_vrrp[1971]: VRRP_Instance(VI_1) Transition to MASTER STATE
Dec 09 21:25:05 ip-172-31-40-95.ec2.internal Keepalived_vrrp[1971]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 09 21:25:06 ip-172-31-40-95.ec2.internal Keepalived_vrrp[1971]: VRRP_Instance(VI_1) Entering MASTER STATE
Dec 09 21:25:06 ip-172-31-40-95.ec2.internal Keepalived_vrrp[1971]: VRRP_Instance(VI_1) setting protocol VIPs.
Dec 09 21:25:06 ip-172-31-40-95.ec2.internal Keepalived_healthcheckers[1970]: Netlink reflector reports IP 172.31.36.57 added
Dec 09 21:25:06 ip-172-31-40-95.ec2.internal Keepalived_vrrp[1971]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 172.31.36.57
Dec 09 21:25:06 ip-172-31-40-95.ec2.internal Keepalived_vrrp[1971]: Opening script file /etc/keepalived/awsha.rb

After a while, the backup should be reachable on the VIP, and have the disk volume mounted under
/mnt.
If you now start keepalived on the old master, it should come back online as the new backup.
Wrapping Up

As we've seen, it's not always possible to architect systems in AWS
for horizontal redundancy. Many pieces of software, particularly those involving one
writer and many readers, cannot be set up this way.
In other situations, it's not desirable to build horizontal redundancy. One real-life
example is a highly-available large shared cache system (e.g. squid or
varnish) where it would be costly to rebuild
terabytes of cache on instance failure. At Chef Software, we use an expanded version
of the tools shown here to implement our Chef Server High-Availability solution.
Finally, I also found this presentation by an AWS solutions architect in Japan very useful
in identifying what L2 and L3 networking technologies are available in AWS:
http://www.slideshare.net/kentayasukawa/ip-multicast-on-ec2