Skip to content

Instantly share code, notes, and snippets.

@pesterhazy
Created September 11, 2020 11:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pesterhazy/d0030f559f600d0ce1b3a090173c9c9c to your computer and use it in GitHub Desktop.
Save pesterhazy/d0030f559f600d0ce1b3a090173c9c9c to your computer and use it in GitHub Desktop.
Slow deploys on Elastic Beanstalk

We're facing unacceptably long deploy times on Elastic Beanstalk. This is a problem we've seen for years now.

To reiterate, we're very unhappy with AWS's performance here and seek a solution as soon as possible.

Deployments in our Java EB environment are so slow (26 minutes to update 8 instances) that we're finding it hard to work with EB. We've considered a number of alternatives, which all seem to be as slow as our current configuration or have other drawbacks.

See below for specific questions and our current view of things. I've included a number of alternative considered.

Summary

We want to speed up our Elastic Beanstalk deployments. In particular, we want to be able to roll back changes quickly in case we find an issue. Today deployments are very slow (26 minutes for the Elastic Beantalk update alone). This means that changes aren't rolled out quickly. More importantly, when we find an issue with code successfully deployed in production (i.e. a bug in the code), rolling back those changes to the previous version takes an unacceptably long time.

Our goals are:

  • Faster turnaround for rollbacks
  • Faster deployment of changes
  • Safety

Questions for AWS support:

  1. Are the deployment times we're seeing common? 26 minutes for a rollout seems like a very long time to wait for simply swapping out a Java JAR on the instances
  2. Is there anything we need to change on our end to improve these times?
  3. Below I list a few alternatives. Are there any alternatives I'm missing?
  4. Investigating the alternatives is costly for us in terms of engineering manpower. Which of the alternatives listed would you suggest to look into?

Below I list the current status quo, as well as a few alternatives considered.

Status quo: Rolling policy with BatchSize=1

Currently we use Rolling policy with BatchSize=1

Observations

  • According to the EB logs, the update took 26 minutes
2020-09-10 18:04:16 UTC+0200 INFO Environment update is starting.
2020-09-10 18:30:10 UTC+0200 INFO Environment update completed successfully.
  • We're seeing mixed traffic for most of those 26 minutes. In other words, if you curl the endpoint, you would see a mix of responses by old code and responses by new code.

Alternative 1: Immutable Deployments

As an experiment, we spun up a new environment with 8 instances with Immutable Deployments policy.

Observations:

  • According to the EB logs, the update took 24 minutes
  • For most of those 24 minutes, we saw mixed traffic (see above).
  • 14 minutes of those 24 minutes was spent after the log entry "Deployment succeeded. Terminating old instances and temporary Auto Scaling group". In that phase we continued to see mixed traffic

Risks and problems:

  • There are known problems with changing both configuration and application version at the same time. It may be necessary to manually switch to Rolling for certain changes (see https://stackoverflow.com/a/48229931/239678). This may cause problems when deploying from CI, which is by definition automated.

Alternative 2: RollingWithAdditionalBatch (BatchSize=100%)

Question: is this any different than immutable deployments?

Alternative 3: RollingWithAdditonalBatch (BatchSize=50%)

Question: is this likely to be faster than Alternative 2?

Alternative 4: Blue/Green Deployments

Blue/Green deployments work by keeping two (or more) environments active at the same time. A deployment creates a new environment. After successful creation, a CNAME switch causes traffic to be routed to the new environment.

Risks and problems:

  • CNAME propagation feels risky. Clients may not honor DNS TTL settings and continue requesting the old IP
  • As a consequence it may be necessary to see traffic to old instances hours or even days after deployment

These risks seem to rule out this possibility

Alternative 5: Switch away from Elastic Beanstalk

Are there any other AWS services that don't suffer from the problems described above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment