We have an autoscaling group of m5.large
instances with 250GB EBS root volumes that we use for running our Buildkite test suites. This group scales up from 0 to 40 during the work day and then down again, so each day we see new instances.
We've been seeing every few days that a few of the instances run very, very slowly. Our test suite either takes 100x it's usual time or times out entirely. On the host machine, a basic command like aws s3api head-object --bucket blah --key blah
will take 45 seconds.
All instances are m5.large
in us-east-1
running the latest Amazon Linux 2 ami-06631de3819cb42f3