armenr/README.md

## README.md

      
    Raw
  

              README.md
            
          
    EC2 Wait Until Ready

This script is part of a broader library of utilities that are used in conjunction with Terraform...to make life better/easier for Ops & SRE.
Use-Case

Not everything begins and ends with Kubernetes. Sometimes, you've got things to do directly on an EC2. It (almost) always goes the same way:

Create an instance
Wait for that instance to "come online"
Ensure that all default cloud-init scripts (and any other UserData) have executed to completion
Do useful work 🫠

As it turns out, it's not very easy or straightforward to cleanly handle that particular case in Terraform.
That's what this script is for.
Usage/Example

Let's say that you've got an EC2 instance you want to provision as soon as it's created. Let's also assume that you want to provision it with something like Ansible.
First, we create a null_resource
# Blocks ansible run until new/all hosts are ready
resource "null_resource" "verify_instance_readiness" {

  # Run this always...any new instance and/or existing instance should always
  # be ready before the terraform run proceeds
  triggers = { always_run = timestamp() }

  provisioner "local-exec" {
    command = "${path.module}/ec2-wait-until-ready.sh ${aws_instance._.id}"
  }
}
Next, we create a second null_resource which depends on the verify_instance_readiness null_resource
resource "null_resource" "ansible" {

  # Triggers matter - we need to ensure we trigger on every possible change to
  # any relevant data, vars, or files
  # ‼️ 👉 This is just ONE *EXAMPLE* trigger, from an existing implementation ‼️
  triggers = {
    # trigger on changes to ansible vars or instance IDs
    instance_id           = aws_instance._.id
    # ...other triggers
  }

  # 
  depends_on = [
    null_resource.verify_instance_readiness
    # ...other dependencies
  ]

  provisioner "local-exec" {
    command = <<-EOT
      ansible-playbook \
        --connection=aws_ssm \
        --inventory ${local_file.ansible_inventory.filename} \
        --extra-vars='${jsonencode(local.aspera_ansible_vars)}' \
      ${local_file.ansible_playbook.filename}
    EOT

    environment = {
      ANSIBLE_REMOTE_TEMP                 = "/tmp/.ansible/tmp"
      ANSIBLE_STDOUT_CALLBACK             = "yaml"
      AWS_PROFILE                         = var.aws_cli_profile
      OBJC_DISABLE_INITIALIZE_FORK_SAFETY = "YES"
    }
  }
}
Outcomes/Behavior

On First-Run/EC2 Instance Creation


Your EC2 is created
Our first null_resource named verify_instance_readiness runs
It waits until all cloud-init and UserData scripts are executed to completion
It returns successfully
Your next null_resource then runs, to execute some set of provisioning steps --> BASH scripts, Ansible playbooks, etc.

On Subsequent terraform runs


The null_resource is set to run every time because of triggers = { always_run = timestamp() }
It runs
It instantly connects to the existing EC2, sees that everything's great, and returns
It amounts to a totally innocuous NOOP


## ec2-wait-until-ready.sh
#!/bin/bash
set -euo pipefail

# This is a simple script which waits until an EC2 instance is reachable via SSM Manager.
# This allows us to "block" our Ansible provisioner until the instance is ready.
# It is assumed that this script resides in the same directory as the terraform module that uses it!
# It may also require some minor changes if you want to explicitly pass a region to the underlying aws command
#
# Usage: ./ec2-wait-until-reachable.sh <INSTANCE_ID>
# Tested with: AL2 Linux + terraform

instanceId=$1
n=0

echo "[*] WAIT_FOR_EC2: Checking instance connectivity and state..."

until [[ "${n}" -ge 10 ]]; do
  echo "[* ${instanceId}]: SSM Connectivity attempt #${n}"
  aws ssm start-session --target "$1" >/dev/null 2>&1 && break
  n=$((n + 1))
  sleep 5
done

echo "[* ${instanceId}]: SSM-session connectivity verified!"

tries=0
RESPONSE_CODE=1

while [[ ${RESPONSE_CODE} != 0 && ${tries} -le 50 ]]; do
  echo "[* ${instanceId}]: Checking if cloud-init is still running - attempt #${tries}"

  cmdId=$(
    aws ssm send-command \
      --document-name AWS-RunShellScript \
      --instance-ids "${instanceId}" \
      --parameters commands="sudo cloud-init status --wait > /dev/null 2>&1" \
      --query Command.CommandId \
      --output text \
      --no-paginate \
      --no-cli-pager
  )

  sleep 5

  RESPONSE_CODE=$(
    aws ssm get-command-invocation \
      --command-id "${cmdId}" \
      --instance-id "${instanceId}" \
      --query ResponseCode \
      --output text \
      --no-paginate \
      --no-cli-pager
  )

  if [[ "${RESPONSE_CODE}" != 0 ]]; then
    echo "[* ${instanceId}]: cloud-init is still running. Retrying in 5 seconds..."
    sleep 5
  fi

  ((tries++))
done

echo "[* ${instanceId}]: response_code => ${RESPONSE_CODE}"
echo "[* ${instanceId}]: cloud-init is no longer running."
echo "[* ${instanceId}]: Let's get to work!"
	#!/bin/bash
	set -euo pipefail

	# This is a simple script which waits until an EC2 instance is reachable via SSM Manager.
	# This allows us to "block" our Ansible provisioner until the instance is ready.
	# It is assumed that this script resides in the same directory as the terraform module that uses it!
	# It may also require some minor changes if you want to explicitly pass a region to the underlying aws command
	#
	# Usage: ./ec2-wait-until-reachable.sh <INSTANCE_ID>
	# Tested with: AL2 Linux + terraform

	instanceId=$1
	n=0

	echo "[*] WAIT_FOR_EC2: Checking instance connectivity and state..."

	until [[ "${n}" -ge 10 ]]; do
	echo "[* ${instanceId}]: SSM Connectivity attempt #${n}"
	aws ssm start-session --target "$1" >/dev/null 2>&1 && break
	n=$((n + 1))
	sleep 5
	done

	echo "[* ${instanceId}]: SSM-session connectivity verified!"

	tries=0
	RESPONSE_CODE=1

	while [[ ${RESPONSE_CODE} != 0 && ${tries} -le 50 ]]; do
	echo "[* ${instanceId}]: Checking if cloud-init is still running - attempt #${tries}"

	cmdId=$(
	aws ssm send-command \
	--document-name AWS-RunShellScript \
	--instance-ids "${instanceId}" \
	--parameters commands="sudo cloud-init status --wait > /dev/null 2>&1" \
	--query Command.CommandId \
	--output text \
	--no-paginate \
	--no-cli-pager
	)

	sleep 5

	RESPONSE_CODE=$(
	aws ssm get-command-invocation \
	--command-id "${cmdId}" \
	--instance-id "${instanceId}" \
	--query ResponseCode \
	--output text \
	--no-paginate \
	--no-cli-pager
	)

	if [[ "${RESPONSE_CODE}" != 0 ]]; then
	echo "[* ${instanceId}]: cloud-init is still running. Retrying in 5 seconds..."
	sleep 5
	fi

	((tries++))
	done

	echo "[* ${instanceId}]: response_code => ${RESPONSE_CODE}"
	echo "[* ${instanceId}]: cloud-init is no longer running."
	echo "[* ${instanceId}]: Let's get to work!"