Skip to content

Instantly share code, notes, and snippets.

@darylounet
Created November 15, 2020 16:26
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save darylounet/2fed238c4fcc0a855bb8b995f7f9f585 to your computer and use it in GitHub Desktop.
AWS NAT HA with IPv6, Amazon Linux 2 and using aws-cli instead of deprecated apitools. Compatible with t4g Arm64 instances.
AWSTemplateFormatVersion: 2010-09-09
Description: >-
NAT HA: creates two NAT nodes in a new
VPC in a hot/hot NAT configuration. After successfully launching this
CloudFormation stack, you will have 4 subnets in 2 AZs (a pair of
public/private subnets in each AZ), with NAT instances routing outbound
traffic for their respective private subnets. The NAT instances will
automatically monitor each other and fix outbound routing problems if the
other instance is unavailable.
Based on https://aws.amazon.com/fr/articles/high-availability-for-amazon-vpc-nat-instances-using-aws-cloudformation-templates/
Parameters:
SshAllowedIp:
Description: Allowed IP (CIDR format) in Security Group to allow NAT instances SSH access.
Type: String
Default: 1.2.3.4/32
KeyName:
Description: Name of an existing EC2 KeyPair to enable SSH access to the instances
Type: String
MinLength: '1'
MaxLength: '64'
AllowedPattern: '[-_ a-zA-Z0-9]*'
ConstraintDescription: 'can contain only alphanumeric characters, spaces, dashes and underscores.'
VpcCidr:
Description: CIDR address for the VPC to be created.
Type: String
Default: 10.0.0.0/16
PublicSubnet1:
Description: Address range for a public subnet to be created in AZ1.
Type: String
Default: 10.0.0.0/24
PublicSubnet2:
Description: Address range for a public subnet to be created in AZ2.
Type: String
Default: 10.0.2.0/24
PrivateSubnet1:
Description: Address range for a public subnet to be created in AZ1.
Type: String
Default: 10.0.1.0/24
PrivateSubnet2:
Description: Address range for a public subnet to be created in AZ2.
Type: String
Default: 10.0.3.0/24
NATNodeInstanceType:
Description: Instance type for NAT nodes.
Type: String
Default: t4g.micro
AllowedValues:
- t4g.nano
- t4g.micro
- t3a.nano
- t3.nano
- t2.nano
- t3a.micro
- t3.micro
- t2.micro
ConstraintDescription: must be a valid EC2 instance type.
AvailabilityZone1:
Description: First AZ to use for PublicSubnet1/PrivateSubnet1.
Type: String
Default: eu-west-1a
AvailabilityZone2:
Description: Second AZ to use for PublicSubnet2/PrivateSubnet2.
Type: String
Default: eu-west-1b
NumberOfPings:
Description: The number of times the health check will ping the alternate NAT Node
Type: String
Default: '3'
PingTimeout:
Description: >-
The number of seconds to wait for each ping response before determining
that the ping has failed
Type: String
Default: '1'
WaitBetweenPings:
Description: The number of seconds to wait between health checks
Type: String
Default: '2'
WaitForInstanceStop:
Description: >-
The number of seconds to wait for alternate NAT Node to stop before
attempting to stop it again
Type: String
Default: '60'
WaitForInstanceStart:
Description: >-
The number of seconds to wait for alternate NAT Node to restart before
resuming health checks again
Type: String
Default: '300'
AwsNatAMI:
Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
Description: >-
Specifies the AMI ID for the staging container instances. Beware of architecture (Arm or x86)
Use amzn2-ami-hvm-x86_64-gp2 for x86 OR amzn2-ami-hvm-arm64-gp2 for Arm
Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-arm64-gp2
Resources:
NATRole:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- 'sts:AssumeRole'
Path: /
Policies:
- PolicyName: NAT_Takeover
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'ec2:DescribeInstances'
- 'ec2:DescribeInstanceStatus'
- 'ec2:DescribeRouteTables'
- 'ec2:CreateRoute'
- 'ec2:ReplaceRoute'
- 'ec2:StartInstances'
- 'ec2:StopInstances'
- 'ssm:DescribeAssociation'
- 'ssm:ListAssociations'
- 'ssm:GetDocument'
- 'ssm:ListInstanceAssociations'
- 'ssm:UpdateAssociationStatus'
- 'ssm:UpdateInstanceInformation'
- 'ssmmessages:CreateDataChannel'
- 'ssmmessages:OpenDataChannel'
- 'ssmmessages:OpenControlChannel'
- 'ssmmessages:CreateControlChannel'
- 'ec2messages:AcknowledgeMessage'
- 'ec2messages:DeleteMessage'
- 'ec2messages:FailMessage'
- 'ec2messages:GetEndpoint'
- 'ec2messages:GetMessages'
- 'ec2messages:SendReply'
Resource: '*'
NATRoleProfile:
Type: 'AWS::IAM::InstanceProfile'
Properties:
Path: /
Roles:
- !Ref NATRole
VPC:
Type: 'AWS::EC2::VPC'
Properties:
CidrBlock: !Ref VpcCidr
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Public
VPCIPv6CidrBlock:
Type: 'AWS::EC2::VPCCidrBlock'
Properties:
AmazonProvidedIpv6CidrBlock: true
VpcId: !Ref VPC
PubSubnet1:
Type: 'AWS::EC2::Subnet'
DependsOn: VPCIPv6CidrBlock # Wait for IPv6 CIDR to be attached to VPC before creating subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Ref AvailabilityZone1
CidrBlock: !Ref PublicSubnet1
Ipv6CidrBlock: !Select [ 1, !Cidr [ !Select [ 0, !GetAtt 'VPC.Ipv6CidrBlocks' ], 256, 64 ] ]
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Public
- Key: Name
Value: 'Public Subnet #1'
PriSubnet1:
Type: 'AWS::EC2::Subnet'
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Ref AvailabilityZone1
CidrBlock: !Ref PrivateSubnet1
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Private
- Key: Name
Value: 'Private Subnet #1'
PubSubnet2:
Type: 'AWS::EC2::Subnet'
DependsOn: VPCIPv6CidrBlock # Wait for IPv6 CIDR to be attached to VPC before creating subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Ref AvailabilityZone2
CidrBlock: !Ref PublicSubnet2
Ipv6CidrBlock: !Select [ 2, !Cidr [ !Select [ 0, !GetAtt 'VPC.Ipv6CidrBlocks' ], 256, 64 ] ]
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Public
- Key: Name
Value: 'Public Subnet #2'
PriSubnet2:
Type: 'AWS::EC2::Subnet'
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Ref AvailabilityZone2
CidrBlock: !Ref PrivateSubnet2
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Private
- Key: Name
Value: 'Private Subnet #2'
InternetGateway:
Type: 'AWS::EC2::InternetGateway'
Properties:
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Public
GatewayToInternet:
Type: 'AWS::EC2::VPCGatewayAttachment'
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
PublicRouteTable:
Type: 'AWS::EC2::RouteTable'
Properties:
VpcId: !Ref VPC
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Public
PrivateRouteTable1:
Type: 'AWS::EC2::RouteTable'
Properties:
VpcId: !Ref VPC
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Private
PrivateRouteTable2:
Type: 'AWS::EC2::RouteTable'
Properties:
VpcId: !Ref VPC
Tags:
- Key: Application
Value: !Ref 'AWS::StackName'
- Key: Network
Value: Private
PublicRoute:
Type: 'AWS::EC2::Route'
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway
PrivateRoute1:
Type: 'AWS::EC2::Route'
Properties:
RouteTableId: !Ref PrivateRouteTable1
DestinationCidrBlock: 0.0.0.0/0
InstanceId: !Ref NAT1Instance
PrivateRoute2:
Type: 'AWS::EC2::Route'
Properties:
RouteTableId: !Ref PrivateRouteTable2
DestinationCidrBlock: 0.0.0.0/0
InstanceId: !Ref NAT2Instance
PubSubnet1RTAssoc:
Type: 'AWS::EC2::SubnetRouteTableAssociation'
Properties:
SubnetId: !Ref PubSubnet1
RouteTableId: !Ref PublicRouteTable
PubSubnet2RTAssoc:
Type: 'AWS::EC2::SubnetRouteTableAssociation'
Properties:
SubnetId: !Ref PubSubnet2
RouteTableId: !Ref PublicRouteTable
PriSubnet1RTAssoc:
Type: 'AWS::EC2::SubnetRouteTableAssociation'
Properties:
SubnetId: !Ref PriSubnet1
RouteTableId: !Ref PrivateRouteTable1
PriSubnet2RTAssoc:
Type: 'AWS::EC2::SubnetRouteTableAssociation'
Properties:
SubnetId: !Ref PriSubnet2
RouteTableId: !Ref PrivateRouteTable2
NAT1EIP:
Type: 'AWS::EC2::EIP'
Properties:
Domain: vpc
InstanceId: !Ref NAT1Instance
NAT2EIP:
Type: 'AWS::EC2::EIP'
Properties:
Domain: vpc
InstanceId: !Ref NAT2Instance
NAT1Instance:
Type: 'AWS::EC2::Instance'
Metadata:
Comment1: 'Create NAT #1'
Properties:
InstanceType: !Ref NATNodeInstanceType
KeyName: !Ref KeyName
IamInstanceProfile: !Ref NATRoleProfile
Ipv6AddressCount: 1
SubnetId: !Ref PubSubnet1
SourceDestCheck: 'false'
ImageId: !Ref AwsNatAMI
CreditSpecification:
CPUCredits: standard
SecurityGroupIds:
- !Ref NATSecurityGroup
Tags:
- Key: Name
Value: 'NAT #1'
UserData:
'Fn::Base64': !Sub |
Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Configure iptables
/sbin/iptables -t nat -A POSTROUTING -o eth0 -s 0.0.0.0/0 -j MASQUERADE
/sbin/iptables-save > /etc/sysconfig/iptables
# Configure ip forwarding and redirects
echo 1 > /proc/sys/net/ipv4/ip_forward && echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
mkdir -p /etc/sysctl.d/
cat <<'EOF' >> /etc/sysctl.d/nat.conf
net.ipv4.ip_forward = 1
net.ipv4.conf.eth0.send_redirects = 0
EOF
mkdir /root/.aws
cat <<'EOF' >> /root/.aws/config
[default]
output = text
region = ${AWS::Region}
EOF
cat <<'EOF' >> /root/nat_monitor.sh
#!/bin/bash
# This script will monitor another NAT instance and take over its routes
# if communication with the other instance fails
# NAT instance variables
# Other instance's IP to ping and route to grab if other node goes down
NAT_ID=
NAT_RT_ID=${PrivateRouteTable2}
# My route to grab when I come back up
My_RT_ID=${PrivateRouteTable1}
# Health Check variables
Num_Pings=${NumberOfPings}
Ping_Timeout=${PingTimeout}
Wait_Between_Pings=${WaitBetweenPings}
Wait_for_Instance_Stop=${WaitForInstanceStop}
Wait_for_Instance_Start=${WaitForInstanceStart}
# Get this instance's ID
Instance_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
# Get the other NAT instance's IP
NAT_IP=$(aws ec2 describe-instances --instance-ids $NAT_ID --query 'Reservations[0].Instances[0].PrivateIpAddress')
echo `date` "-- Starting NAT monitor"
echo `date` "-- Adding this instance to $My_RT_ID default route on start"
aws ec2 replace-route --route-table-id $My_RT_ID --destination-cidr-block 0.0.0.0/0 --instance-id $Instance_ID
# If replace-route failed, then the route might not exist and may need to be created instead
if [ "$?" != "0" ]; then
aws ec2 create-route --route-table-id $My_RT_ID --destination-cidr-block 0.0.0.0/0 --instance-id $Instance_ID
fi
while [ . ]; do
# Check health of other NAT instance
pingresult=`ping -c $Num_Pings -W $Ping_Timeout $NAT_IP | grep time= | wc -l`
# Check to see if any of the health checks succeeded, if not
if [ "$pingresult" == "0" ]; then
# Set HEALTHY variables to unhealthy (0)
ROUTE_HEALTHY=0
NAT_HEALTHY=0
STOPPING_NAT=0
while [ "$NAT_HEALTHY" == "0" ]; do
# NAT instance is unhealthy, loop while we try to fix it
if [ "$ROUTE_HEALTHY" == "0" ]; then
echo `date` "-- Other NAT heartbeat failed, taking over $NAT_RT_ID default route"
aws ec2 replace-route --route-table-id $NAT_RT_ID --destination-cidr-block 0.0.0.0/0 --instance-id $Instance_ID
ROUTE_HEALTHY=1
fi
# Check NAT state to see if we should stop it or start it again
NAT_STATE=$(aws ec2 describe-instances --instance-ids $NAT_ID --query 'Reservations[*].Instances[*].State.Name')
if [ "$NAT_STATE" == "stopped" ]; then
echo `date` "-- Other NAT instance stopped, starting it back up"
aws ec2 start-instances --instance-ids $NAT_ID
NAT_HEALTHY=1
sleep $Wait_for_Instance_Start
else
if [ "$STOPPING_NAT" == "0" ]; then
echo `date` "-- Other NAT instance $NAT_STATE, attempting to stop for reboot"
aws ec2 stop-instances --instance-ids $NAT_ID
STOPPING_NAT=1
fi
sleep $Wait_for_Instance_Stop
fi
done
else
sleep $Wait_Between_Pings
fi
done
EOF
NAT_ID=
# Wait for CloudFormation PrivateRouteTable2 update
while [ "${!NAT_ID}" == "" ]; do
sleep 60
NAT_ID=$(aws ec2 describe-route-tables --route-table-ids ${PrivateRouteTable2} --query 'RouteTables[*].Routes[?DestinationCidrBlock==`0.0.0.0/0`].InstanceId')
done
sed -i "s/NAT_ID=/NAT_ID=${!NAT_ID}/" /root/nat_monitor.sh
chmod a+x /root/nat_monitor.sh
echo '@reboot /sbin/iptables-restore /etc/sysconfig/iptables' | crontab
(crontab -l; echo '@reboot /root/nat_monitor.sh > /tmp/nat_monitor.log') | crontab
/root/nat_monitor.sh > /tmp/nat_monitor.log &
--==BOUNDARY==
NAT2Instance:
Type: 'AWS::EC2::Instance'
Metadata:
Comment1: 'Create NAT #2'
Properties:
InstanceType: !Ref NATNodeInstanceType
KeyName: !Ref KeyName
IamInstanceProfile: !Ref NATRoleProfile
Ipv6AddressCount: 1
SubnetId: !Ref PubSubnet2
SourceDestCheck: 'false'
ImageId: !Ref AwsNatAMI
CreditSpecification:
CPUCredits: standard
SecurityGroupIds:
- !Ref NATSecurityGroup
Tags:
- Key: Name
Value: 'NAT #2'
UserData:
'Fn::Base64': !Sub |
Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Configure iptables
/sbin/iptables -t nat -A POSTROUTING -o eth0 -s 0.0.0.0/0 -j MASQUERADE
/sbin/iptables-save > /etc/sysconfig/iptables
# Configure ip forwarding and redirects
echo 1 > /proc/sys/net/ipv4/ip_forward && echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
mkdir -p /etc/sysctl.d/
cat <<'EOF' >> /etc/sysctl.d/nat.conf
net.ipv4.ip_forward = 1
net.ipv4.conf.eth0.send_redirects = 0
EOF
mkdir /root/.aws
cat <<'EOF' >> /root/.aws/config
[default]
output = text
region = ${AWS::Region}
EOF
cat <<'EOF' >> /root/nat_monitor.sh
#!/bin/bash
# This script will monitor another NAT instance and take over its routes
# if communication with the other instance fails
# NAT instance variables
# Other instance's IP to ping and route to grab if other node goes down
NAT_ID=${NAT1Instance}
NAT_RT_ID=${PrivateRouteTable1}
# My route to grab when I come back up
My_RT_ID=${PrivateRouteTable2}
# Health Check variables
Num_Pings=${NumberOfPings}
Ping_Timeout=${PingTimeout}
Wait_Between_Pings=${WaitBetweenPings}
Wait_for_Instance_Stop=${WaitForInstanceStop}
Wait_for_Instance_Start=${WaitForInstanceStart}
# Get this instance's ID
Instance_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
# Get the other NAT instance's IP
NAT_IP=$(aws ec2 describe-instances --instance-ids $NAT_ID --query 'Reservations[0].Instances[0].PrivateIpAddress')
echo `date` "-- Starting NAT monitor"
echo `date` "-- Adding this instance to $My_RT_ID default route on start"
aws ec2 replace-route --route-table-id $My_RT_ID --destination-cidr-block 0.0.0.0/0 --instance-id $Instance_ID
# If replace-route failed, then the route might not exist and may need to be created instead
if [ "$?" != "0" ]; then
aws ec2 create-route --route-table-id $My_RT_ID --destination-cidr-block 0.0.0.0/0 --instance-id $Instance_ID
fi
while [ . ]; do
# Check health of other NAT instance
pingresult=`ping -c $Num_Pings -W $Ping_Timeout $NAT_IP | grep time= | wc -l`
# Check to see if any of the health checks succeeded, if not
if [ "$pingresult" == "0" ]; then
# Set HEALTHY variables to unhealthy (0)
ROUTE_HEALTHY=0
NAT_HEALTHY=0
STOPPING_NAT=0
while [ "$NAT_HEALTHY" == "0" ]; do
# NAT instance is unhealthy, loop while we try to fix it
if [ "$ROUTE_HEALTHY" == "0" ]; then
echo `date` "-- Other NAT heartbeat failed, taking over $NAT_RT_ID default route"
aws ec2 replace-route --route-table-id $NAT_RT_ID --destination-cidr-block 0.0.0.0/0 --instance-id $Instance_ID
ROUTE_HEALTHY=1
fi
# Check NAT state to see if we should stop it or start it again
NAT_STATE=$(aws ec2 describe-instances --instance-ids $NAT_ID --query 'Reservations[*].Instances[*].State.Name')
if [ "$NAT_STATE" == "stopped" ]; then
echo `date` "-- Other NAT instance stopped, starting it back up"
aws ec2 start-instances --instance-ids $NAT_ID
NAT_HEALTHY=1
sleep $Wait_for_Instance_Start
else
if [ "$STOPPING_NAT" == "0" ]; then
echo `date` "-- Other NAT instance $NAT_STATE, attempting to stop for reboot"
aws ec2 stop-instances --instance-ids $NAT_ID
STOPPING_NAT=1
fi
sleep $Wait_for_Instance_Stop
fi
done
else
sleep $Wait_Between_Pings
fi
done
EOF
chmod a+x /root/nat_monitor.sh
echo '@reboot /sbin/iptables-restore /etc/sysconfig/iptables' | crontab
(crontab -l; echo '@reboot /root/nat_monitor.sh > /tmp/nat_monitor.log') | crontab
/root/nat_monitor.sh >> /tmp/nat_monitor.log &
--==BOUNDARY==
NATSecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupDescription: Rules for allowing access to HA Nodes
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '22'
ToPort: '22'
CidrIp: !Ref SshAllowedIp
Description: SSH Allowed IP
- IpProtocol: '-1'
FromPort: '0'
ToPort: '65535'
CidrIp: !Ref VpcCidr
Description: SG Internal
SecurityGroupEgress:
- IpProtocol: '-1'
FromPort: '0'
ToPort: '65535'
CidrIp: 0.0.0.0/0
Description: VPC Internal
NATAllowICMP:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
GroupId: !Ref NATSecurityGroup
IpProtocol: icmp
FromPort: '-1'
ToPort: '-1'
SourceSecurityGroupId: !Ref NATSecurityGroup
Outputs:
NATSG:
Description: 'NAT Security Group'
Value: !Ref NATSecurityGroup
Export:
Name: NATSG
NAT1:
Description: 'NAT #1 EIP.'
Value: !Join
- ''
- - !Ref NAT1Instance
- ' ('
- !Ref NAT1EIP
- )
NAT2:
Description: 'NAT #2 EIP.'
Value: !Join
- ''
- - !Ref NAT2Instance
- ' ('
- !Ref NAT2EIP
- )
VPCID:
Description: VPC ID.
Value: !Ref VPC
Export:
Name: VPCID
PublicSubnet1:
Description: 'Public Subnet #1.'
Value: !Ref PubSubnet1
Export:
Name: PublicSubnet1
PrivateSubnet1:
Description: 'Private Subnet #1.'
Value: !Ref PriSubnet1
Export:
Name: PrivateSubnet1
PublicSubnet2:
Description: 'Public Subnet #2.'
Value: !Ref PubSubnet2
Export:
Name: PublicSubnet2
PrivateSubnet2:
Description: 'Private Subnet #2.'
Value: !Ref PriSubnet2
Export:
Name: PrivateSubnet2
AvailabilityZone1:
Description: 'AZ #1.'
Value: !Ref AvailabilityZone1
Export:
Name: AvailabilityZone1
AvailabilityZone2:
Description: 'AZ #2.'
Value: !Ref AvailabilityZone2
Export:
Name: AvailabilityZone2
PublicRouteTable:
Description: Public Route Table.
Value: !Join
- ''
- - !Ref PublicRouteTable
- ' (0.0.0.0/0 -> '
- !Ref InternetGateway
- )
PrivateRouteTable1:
Description: 'Private Route Table #1.'
Value: !Join
- ''
- - !Ref PrivateRouteTable1
- ' (0.0.0.0/0 -> '
- !Ref NAT1Instance
- )
PrivateRouteTable2:
Description: 'Private Route Table #2.'
Value: !Join
- ''
- - !Ref PrivateRouteTable2
- ' (0.0.0.0/0 -> '
- !Ref NAT2Instance
- )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment