pesterhazy/slow-deploys-eb.md Secret

## slow-deploys-eb.md

      
    Raw
  

              slow-deploys-eb.md
            
          
    We're facing unacceptably long deploy times on Elastic Beanstalk. This is a problem we've seen for years now.
To reiterate, we're very unhappy with AWS's performance here and seek a solution as soon as possible.
Deployments in our Java EB environment are so slow (26 minutes to update 8 instances) that we're finding it hard to work with EB. We've considered a number of alternatives, which all seem to be as slow as our current configuration or have other drawbacks.
See below for specific questions and our current view of things. I've included a number of alternative considered.
Summary

We want to speed up our Elastic Beanstalk deployments. In particular, we want to be able to roll back changes quickly in case we find an issue. Today deployments are very slow (26 minutes for the Elastic Beantalk update alone). This means that changes aren't rolled out quickly. More importantly, when we find an issue with code successfully deployed in production (i.e. a bug in the code), rolling back those changes to the previous version takes an unacceptably long time.
Our goals are:

Faster turnaround for rollbacks
Faster deployment of changes
Safety

Questions for AWS support:

Are the deployment times we're seeing common? 26 minutes for a rollout seems like a very long time to wait for simply swapping out a Java JAR on the instances
Is there anything we need to change on our end to improve these times?
Below I list a few alternatives. Are there any alternatives I'm missing?
Investigating the alternatives is costly for us in terms of engineering manpower. Which of the alternatives listed would you suggest to look into?

Below I list the current status quo, as well as a few alternatives considered.
Status quo: Rolling policy with BatchSize=1

Currently we use Rolling policy with BatchSize=1
Observations

According to the EB logs, the update took 26 minutes

2020-09-10 18:04:16 UTC+0200 INFO Environment update is starting.
2020-09-10 18:30:10 UTC+0200 INFO Environment update completed successfully.


We're seeing mixed traffic for most of those 26 minutes. In other words, if you curl the endpoint, you would see a mix of responses by old code and responses by new code.

Alternative 1: Immutable Deployments

As an experiment, we spun up a new environment with 8 instances with Immutable Deployments policy.
Observations:

According to the EB logs, the update took 24 minutes
For most of those 24 minutes, we saw mixed traffic (see above).
14 minutes of those 24 minutes was spent after the log entry "Deployment succeeded. Terminating old instances and temporary Auto Scaling group". In that phase we continued to see mixed traffic

Risks and problems:

There are known problems with changing both configuration and application version at the same time. It may be necessary to manually switch to Rolling for certain changes (see https://stackoverflow.com/a/48229931/239678). This may cause problems when deploying from CI, which is by definition automated.

Alternative 2: RollingWithAdditionalBatch (BatchSize=100%)

Question: is this any different than immutable deployments?
Alternative 3: RollingWithAdditonalBatch (BatchSize=50%)

Question: is this likely to be faster than Alternative 2?
Alternative 4: Blue/Green Deployments

Blue/Green deployments work by keeping two (or more) environments active at the same time. A deployment creates a new environment. After successful creation, a CNAME switch causes traffic to be routed to the new environment.
Risks and problems:

CNAME propagation feels risky. Clients may not honor DNS TTL settings and continue requesting the old IP
As a consequence it may be necessary to see traffic to old instances hours or even days after deployment

These risks seem to rule out this possibility
Alternative 5: Switch away from Elastic Beanstalk

Are there any other AWS services that don't suffer from the problems described above?