We're facing unacceptably long deploy times on Elastic Beanstalk. This is a problem we've seen for years now.
To reiterate, we're very unhappy with AWS's performance here and seek a solution as soon as possible.
Deployments in our Java EB environment are so slow (26 minutes to update 8 instances) that we're finding it hard to work with EB. We've considered a number of alternatives, which all seem to be as slow as our current configuration or have other drawbacks.
See below for specific questions and our current view of things. I've included a number of alternative considered.
We want to speed up our Elastic Beanstalk deployments. In particular, we want to be able to roll back changes quickly in case we find an issue. Today deployments are very slow (26 minutes for the Elastic Beantalk update alone). This means that changes aren't rolled out quickly. More importantly, when we find an issue with code successfully deployed in production (i.e. a bug in the code), rolling back those changes to the previous version takes an unacceptably long time.
Our goals are:
- Faster turnaround for rollbacks
- Faster deployment of changes
- Safety
Questions for AWS support:
- Are the deployment times we're seeing common? 26 minutes for a rollout seems like a very long time to wait for simply swapping out a Java JAR on the instances
- Is there anything we need to change on our end to improve these times?
- Below I list a few alternatives. Are there any alternatives I'm missing?
- Investigating the alternatives is costly for us in terms of engineering manpower. Which of the alternatives listed would you suggest to look into?
Below I list the current status quo, as well as a few alternatives considered.
Currently we use Rolling policy with BatchSize=1
Observations
- According to the EB logs, the update took 26 minutes
2020-09-10 18:04:16 UTC+0200 INFO Environment update is starting.
2020-09-10 18:30:10 UTC+0200 INFO Environment update completed successfully.
- We're seeing mixed traffic for most of those 26 minutes. In other words, if you
curl
the endpoint, you would see a mix of responses by old code and responses by new code.
As an experiment, we spun up a new environment with 8 instances with Immutable Deployments policy.
Observations:
- According to the EB logs, the update took 24 minutes
- For most of those 24 minutes, we saw mixed traffic (see above).
- 14 minutes of those 24 minutes was spent after the log entry "Deployment succeeded. Terminating old instances and temporary Auto Scaling group". In that phase we continued to see mixed traffic
Risks and problems:
- There are known problems with changing both configuration and application version at the same time. It may be necessary to manually switch to Rolling for certain changes (see https://stackoverflow.com/a/48229931/239678). This may cause problems when deploying from CI, which is by definition automated.
Question: is this any different than immutable deployments?
Question: is this likely to be faster than Alternative 2?
Blue/Green deployments work by keeping two (or more) environments active at the same time. A deployment creates a new environment. After successful creation, a CNAME switch causes traffic to be routed to the new environment.
Risks and problems:
- CNAME propagation feels risky. Clients may not honor DNS TTL settings and continue requesting the old IP
- As a consequence it may be necessary to see traffic to old instances hours or even days after deployment
These risks seem to rule out this possibility
Are there any other AWS services that don't suffer from the problems described above?